CN116679985A

CN116679985A - Processing method, system, device, equipment and storage medium for loop branch instruction

Info

Publication number: CN116679985A
Application number: CN202310945145.8A
Authority: CN
Inventors: 薛臻; 勾凌睿; 陈键; 唐丹; 包云岗
Original assignee: Beijing Open Source Chip Research Institute
Current assignee: Beijing Open Source Chip Research Institute
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-09-01
Anticipated expiration: 2043-07-28
Also published as: CN116679985B

Abstract

The application provides a processing method, a system, a device, electronic equipment and a computer readable storage medium for loop branch instructions, comprising the following steps: obtaining an error recovery request aiming at a circulating branch instruction, wherein the error recovery request is sent by a processor after obtaining an actual execution result of the circulating branch instruction, and state information of the circulating branch instruction is obtained from a preset cache unit; the state information reflects the execution condition of the loop branch instruction in the loop; and updating the state information of the loop branch instruction in the cache unit according to the actual execution result and the target iteration count value, and completing error recovery of the loop branch instruction. The application saves the time of waiting for instruction submission by error recovery processing started earlier, realizes the adaptation of a high-performance processor, ensures the correct maintenance of the internal state of the cyclic predictor, and improves the accuracy and efficiency of prediction.

Description

Processing method, system, device, equipment and storage medium for loop branch instruction

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, a system, an apparatus, an electronic device, and a computer readable storage medium for processing a loop branch instruction.

Background

Loop branch instructions are widely available in computer programs, which can choose between "jump to a particular address" and "continue sequential execution", thereby enabling control of the direction of execution of the instructions in a loop.

In a loop with fixed iteration times, the execution direction of the previous iteration is jumping, and the execution direction of the last iteration is non-jumping, because the actual execution result of the loop branch instruction is obtained after the processor actually executes the loop branch instruction, the execution direction of the loop branch instruction can be predicted by a prediction means before the processor executes the loop branch instruction, so that the corresponding loop branch instruction is executed in advance, idle waiting of a pipeline is avoided, and the operation speed of the processor is improved. In the prior art, after the loop branch instruction is submitted, according to the actual execution result and the prediction result obtained by the processor, whether the misprediction phenomenon occurs in the prediction link can be judged, and the internal state of the predictor is corrected under the condition that the misprediction occurs, so that the recovery of the misprediction is realized, and the prediction performance is improved.

However, the high-performance processor scene has the characteristics of large instruction window and deep pipeline stage number, and at the same moment, a plurality of instructions in different states exist on the pipeline, and the current scheme performs misprediction recovery after the instruction is submitted, so that the predictor is in an error state in the period from misprediction to instruction submission, and can not provide correct instructions for the rear end of the pipeline, thereby reducing the performance of the processor.

Disclosure of Invention

Embodiments of the present application provide a method, a system, an apparatus, an electronic device, and a computer readable storage medium for processing a loop branch instruction, so as to solve the problems in the related art.

In a first aspect, an embodiment of the present application provides a method for processing a loop branch instruction, where the method includes:

obtaining an error recovery request for a loop branch instruction, wherein the error recovery request is sent by a processor after obtaining an actual execution result of the loop branch instruction, and the error recovery request comprises the actual execution result and a target iteration count value; the target iteration count value reflects the number of iterations that the loop branch instruction has speculatively executed;

acquiring state information of the circulating branch instruction from a preset cache unit; the state information reflects the execution condition of the loop branch instruction in the loop;

and updating the state information of the loop branch instruction in the first cache unit according to the actual execution result and the target iteration count value, and completing error recovery of the loop branch instruction.

In a second aspect, an embodiment of the present application provides a processing system for a loop branch instruction, the system comprising:

A loop predictor, a main predictor, and a multiplexer;

the loop predictor includes: the device comprises a prediction unit, a recovery unit and a first cache unit;

the prediction unit is used for predicting the cyclic branch instruction to be predicted according to the state information of the cyclic branch instruction recorded in the first cache unit, so as to obtain a first prediction result;

the recovery unit is used for updating the state information of the loop branch instruction in the first cache unit according to the actual execution result and the target iteration count value contained in the error recovery request under the condition that the error recovery request for the loop branch instruction is received, and completing the error recovery of the loop branch instruction;

the main predictor is used for obtaining a second prediction result according to the history execution record of the circulating branch instruction;

the multiplexer is used for selecting the first prediction result or the second prediction result to output.

In a third aspect, an embodiment of the present application provides a processing apparatus for a loop branch instruction, the apparatus including:

a first acquisition module, configured to acquire an error recovery request for a loop branch instruction, where the error recovery request is issued by a processor after an actual execution result of the loop branch instruction is obtained, and the error recovery request includes the actual execution result and a target iteration count value; the target iteration count value reflects the number of iterations that the loop branch instruction has speculatively executed;

The second acquisition module is used for acquiring the state information of the circulating branch instruction from a preset cache unit; the state information reflects the execution condition of the loop branch instruction in the loop;

and the recovery module is used for updating the state information of the loop branch instruction in the first cache unit according to the actual execution result and the target iteration count value to finish error recovery of the loop branch instruction.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of the first aspect.

In a fifth aspect, embodiments of the present application also provide a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of the first aspect.

The embodiment of the application can optimize the processing time of error recovery, starts error recovery processing when the processor obtains the actual execution result of the loop branch instruction and sends an error recovery request, and updates the state information of the loop branch instruction in the cache unit of the loop predictor through the actual execution result and the target iteration count value in the error recovery request to finish error recovery of the loop branch instruction.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic diagram of a processing system for a loop branch instruction according to an embodiment of the present application;

FIG. 2 is a flow chart of steps of a method for processing a loop branch instruction according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating steps of a method for processing a loop branch instruction according to an embodiment of the present application;

FIG. 4 is a block diagram of a loop predictor provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a loop branch instruction prediction process according to an embodiment of the present application;

FIG. 6 is a block diagram of a processing apparatus for loop branch instructions according to an embodiment of the present application;

FIG. 7 is a block diagram of an electronic device provided by an embodiment of the present application;

fig. 8 is a block diagram of another electronic device in accordance with another embodiment of the application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, the term "and/or" as used in the specification and claims to describe an association of associated objects means that there may be three relationships, e.g., a and/or B, may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The term "plurality" in embodiments of the present application means two or more, and other adjectives are similar.

Term interpretation:

submitting: a link of the instruction performed after the execution link, if the execution pointed by the instruction is marked as submittable, a signal can be sent to release the resources occupied by the instruction, such as a physical register and the like; the instruction is also considered as committed, formally ending the lifecycle.

Loop branch instruction: branch instructions executed in loops with a fixed number of iterations, present in the loop body of the program, for controlling the direction of instruction execution, may choose between "jump to a specific address" (jump) and "continue sequential execution" (no jump), but the actual selection result needs to be known after the instruction execution is completed. The jump refers to that the address obtained by executing the loop branch instruction is not the next address executed sequentially, and the jump is needed to be performed to obtain the address at the moment; the non-jump means that the address obtained by executing the loop branch instruction is the next address executed sequentially, and the next address is directly executed without jumping.

Branch prediction unit (BPU, branch Predictor Unit): for predicting a loop branch instruction to obtain a predicted result, the branch prediction unit may include a plurality of branch predictors, the predicted results of which together generate a predicted result for the entire branch prediction unit, which is often at the front end of the pipeline.

Loop branch Predictor (Loop Predictor): the loop branch instruction is predicted by recording the current iteration number and the total iteration number of a loop.

Correct path and speculative path: when the processor executes the instructions, a plurality of instructions are executed simultaneously, and the instructions have strict sequence relation. Some of these instructions may cause branches in the execution path, such as loop branch instructions, and eventually only one of the paths will be executed. The processor may guess the execution direction and select an instruction on a path, which is the speculative path, to execute to improve performance. The instructions on the speculative path are possibly invalid, and when the execution of the branch instruction corresponding to the speculative path is completed and the speculative path of the processor is the same as the execution result, the speculative path becomes a correct path; and when the speculation path of the processor is different from the execution result, the prediction link is considered to be in error.

Referring to fig. 1, fig. 1 is a schematic diagram of a processing system for a loop branch instruction according to an embodiment of the present application, including: loop predictor, main predictor and multiplexer.

The loop branch predictor predicts the loop branch instruction by recording the current iteration times and the total iteration times of the loop branch instruction; the main predictor is an integral part of the branch prediction unit, and the main predictor is used for performing branch prediction by utilizing the historical execution record of the loop branch instruction to obtain a second prediction result of the loop branch instruction. In the embodiment of the present application, the loop branch instruction is a branch instruction executed in a loop with a fixed number of iterations, for example, assuming that the loop has 100 iterations, the loop branch instruction should perform "jump" in the first 99 iterations, and the loop branch instruction should perform "no jump" in the last iteration, thereby completing the loop.

In the related art, since the main predictor has a limitation of hardware specification, there is a limitation of the data length of the history execution record supported by the main predictor, and when a loop having more iterations is faced, the main predictor cannot completely record all iterations of one loop, so that it is difficult for the main predictor to correctly predict the branch direction of the last iteration of the loop branch instruction as not jumping.

In order to solve the above problems, referring to fig. 1, according to the embodiment of the present application, a loop predictor may be added to predict a branch direction of a loop branch instruction existing in a loop in a program based on a main predictor, the loop predictor records a current iteration number (to which iteration a loop is performed) and a total iteration number of the loop branch instruction through counters, and branch prediction of the loop branch instruction may be made by comparing values of the two counters, for example, when the current iteration count value and the total iteration number are the same, it is determined that a prediction result is a jump; and when the current iteration count value is different from the total iteration number, determining that the predicted result is not jumped.

Further, the multiplexer may selectively output the first prediction result of the cyclic predictor or output the second prediction result of the main predictor, and generally, a confidence value may be maintained in the cyclic predictor, where the confidence value is in positive correlation with the accuracy of the first prediction result, and the multiplexer may selectively output the first prediction result if the confidence value is greater than a preset threshold, and may selectively output the second prediction result if the confidence value is less than or equal to the preset threshold.

FIG. 2 is a flowchart illustrating a method for processing a loop branch instruction according to an embodiment of the present application, where, as shown in FIG. 2, the method may include:

step 101, obtain error recovery request for loop branch instruction.

The error recovery request is sent by the processor after the actual execution result of the loop branch instruction is obtained, and comprises the actual execution result and a target iteration count value; the target iteration count value reflects the number of iterations that the loop branch instruction has speculatively executed.

In the operation of the system architecture shown in fig. 1 according to the embodiment of the present application, the accuracy of prediction for the loop branch instruction depends on the correct maintenance of the count value by the loop predictor, wherein error recovery when predicting an error is mainly covered. The prediction error refers to a prediction result of the loop predictor for the loop branch instruction, which is different from an execution result obtained by actually executing the loop branch instruction by the processor, and is considered to be wrong at this time, and the error recovery needs to be realized by updating a count value maintained in the loop predictor.

In the related art, the processing time of error recovery is to correct the internal state of the predictor to recover the misprediction when the misprediction is judged to occur according to the actual execution result and the prediction result obtained by the processor after the loop branch instruction is submitted. However, the high-performance processor scene has the characteristics of large instruction window and deep pipeline stage number, so that the time interval between instruction execution and submission is longer in the high-performance scene, and the predictors in the related technical schemes are in an error state in a longer period from the occurrence of misprediction to the instruction submission, and cannot provide correct instruction prediction for the rear end of the pipeline, thereby reducing the performance of the processor.

Therefore, the embodiment of the application can optimize the processing time of error recovery more, particularly, the processing of error recovery is started when the processor obtains the actual execution result of the cyclic branch instruction, and the actual execution result can be used for determining whether the prediction error exists at the moment, so that the instruction is not required to be submitted, the time for waiting for the instruction to be submitted is saved through the error recovery processing started earlier, the adaptation of the high-performance processor is realized, the correct maintenance of the internal state of the cyclic predictor is ensured, and the accuracy and the efficiency of prediction are improved.

In this step, when the actual execution result of the loop branch instruction is obtained by executing the loop branch instruction, the processor may already determine whether there is a misprediction phenomenon by comparing the actual execution result of the loop branch instruction with the prediction result of the previous loop predictor, so that, after obtaining the actual execution result of the loop branch instruction, the processor may issue an error recovery request including the actual execution result and the target iteration count value to the loop predictor, and the loop predictor may be informed to start error recovery. Wherein the target iteration count value reflects the number of iterations that the currently executed loop branch instruction with prediction errors has speculatively executed.

102, acquiring state information of the circulating branch instruction from a preset cache unit; the state information reflects the execution of the loop branch instruction in the loop.

In the embodiment of the application, the state information of the loop branch instruction can be maintained in the cache unit of the loop predictor, the state information mainly reflects the execution condition of the loop branch instruction in the loop, such as the total iteration times contained in the loop and the iteration times currently executed by the loop branch instruction, the prediction result of the loop predictor is obtained according to the state information, under the condition that the prediction error occurs, the state information is indicated to have errors, and the error recovery mechanism is to update the state information when the prediction error occurs, so that the follow-up normal prediction of the loop predictor is ensured, and if the error recovery is not performed in time, the loop predictor is in the error state for a long time later, and the correct instruction prediction cannot be provided for the rear end of the pipeline.

And step 103, updating the state information of the loop branch instruction in the cache unit according to the actual execution result and the target iteration count value, and completing error recovery of the loop branch instruction.

In the embodiment of the application, since the actual execution result (that is, the result reflecting that the loop branch instruction executes the jump or does not jump) is the result obtained by the processor actually executing the loop branch instruction, and the target iteration count value is the speculative count of the loop predictor at the time, the actual execution result reflects the actual execution state of the loop branch instruction, and when error recovery is performed, the actual execution result and the target iteration count value can be adopted to update the state information of the loop branch instruction in the cache unit, thereby eliminating the error content in the state information of the loop branch instruction, enabling the follow-up loop predictor to normally execute the loop, and realizing recovery of the loop branch instruction prediction error.

In summary, the embodiment of the application can optimize the processing time of error recovery, start error recovery processing when the processor obtains the actual execution result of the loop branch instruction and issues an error recovery request, update the state information of the loop branch instruction in the cache unit of the loop predictor through the actual execution result and the target iteration count value in the error recovery request, and complete the error recovery of the loop branch instruction.

FIG. 3 is a flowchart illustrating specific steps of a method for processing a loop branch instruction according to an embodiment of the present application, where, as shown in FIG. 3, the method may include:

step 201, obtain an error recovery request for a loop branch instruction.

This step may refer to step 101, which is not described herein.

Step 202, acquiring state information of the loop branch instruction from a preset cache unit; the state information reflects the execution of the loop branch instruction in the loop.

This step may refer to step 102 described above, and will not be described here.

Step 203, resetting the current iteration count value of the state information of the loop branch instruction in the cache unit to 0 when the actual execution result is no jump.

Wherein the status information includes: the current iteration count of the loop branch instruction.

In the embodiment of the application, since the loop branch instruction is a branch instruction executed in a loop with a fixed iteration number, assuming that the loop has n iteration numbers, the loop branch instruction should perform "jump" in 1 to n-1 iterations, and the loop branch instruction should perform "no jump" in the nth iteration, thereby completing the loop.

Specifically, the cache unit of the loop predictor may maintain state information of the loop branch instruction, where the state information may include a current iteration count value that records a number of iterations currently performed by the loop branch instruction. When the loop predictor generates a prediction error and the actual execution result of the processor executing the loop branch instruction is not jumping, the error prediction result of the loop predictor for the loop branch instruction is considered to be jumping, but the actual execution result of the loop branch instruction is not jumping, which means that the loop branch instruction has completed all iterations in the loop, and when the loop branch instruction has completed all iterations in the loop, the current iteration count value of the loop branch instruction in the cache unit should be reset to 0 to represent the end of the loop. The current iteration count value of the state information of the loop branch instruction in the cache unit can be reset to 0 for error recovery under the condition that the actual execution result is not jump, so that the correct maintenance of the internal state of the loop predictor is ensured, and the accuracy and the efficiency of prediction are improved.

And 204, modifying the current iteration count value into the target iteration count value under the condition that the actual execution result is a jump.

Specifically, the error recovery request further includes a target iteration count value of the loop branch instruction with the error prediction, where the target iteration count value is information that is transferred to the back end of the pipeline along with the prediction result when the loop predictor predicts a certain iteration of the loop branch instruction and is used for recording the number of iterations. For example, when the loop predictor predicts the 55 th iteration of the loop branch instruction, a target iteration count value 55 is also passed along with the predicted result to the back end of the pipeline.

When the loop predictor has a prediction error and the actual execution result of the processor executing the loop branch instruction is a jump, the error prediction result of the loop predictor for the loop branch instruction is considered to be non-jump, but the actual execution result of the loop branch instruction is a jump, which means that the loop branch instruction has not completed all iterations in the loop, but the current iteration count value of the loop branch instruction in the cache unit is reset to 0 due to the occurrence of the error prediction by the loop predictor. The current iteration count value can be modified to the target iteration count value for error recovery under the condition that the actual execution result is jump so as to characterize the loop to be continuously executed from the iteration characterized by the target iteration count value, thereby ensuring the correct maintenance of the internal state of the loop predictor and improving the accuracy and efficiency of the prediction.

Optionally, before step 201, steps 205-206 may also be included:

step 205, for a loop branch instruction to be predicted, constructing state information corresponding to the loop branch instruction in a cache unit, where the state information includes: current iteration count and total number of iterations.

Step 206, obtaining a first prediction result by comparing the current iteration count value with the total iteration number.

In an embodiment of the present application, for steps 205-206, since the loop branch instruction is a branch instruction executed in a loop having a fixed number of iterations, assuming the loop has n iterations, the loop branch instruction should perform a "jump" from 1 to n-1 iterations, and the loop branch instruction should perform a "no jump" from the nth iteration, thereby completing the loop.

Based on the above characteristics, the state information of the loop branch instruction may be maintained in the cache unit of the loop predictor, where the state information includes: the method comprises the steps of recording the current iteration count value and the total iteration number, wherein the current iteration count value records the iteration number of the current execution of a loop branch instruction, and after each iteration is executed, the current iteration count value is incremented by 1 bit; the total number of iterations reflects the total number of iterations that need to be performed in the loop. The loop predictor may obtain the first prediction result based on the maintained state information, specifically by comparing the current iteration count value in the state information with the total iteration number.

Optionally, in one implementation, step 206 may specifically include sub-steps 2061-2062:

substep 2061, determining that the first prediction result is not skip when the current iteration count value is the same as the total iteration number.

Substep 2062, determining that the first prediction result is a jump when the current iteration count value and the total iteration number are different.

In an embodiment of the present application, for sub-steps 2061-2062, since the loop branch instruction is an instruction executed in a loop having a fixed number of iterations n, and based on the characteristic that the loop branch instruction should perform "jump" in 1 to n-1 iterations, the loop branch instruction should perform "no jump" in the nth iteration, the loop predictor may compare, when predicting, a current iteration count value in state information of the currently predicted loop branch instruction with a total number of iterations, and, when the two are the same, represent all iterations of the completed loop, and determine that the first prediction result is no jump; if the two are the same, indicating that all iterations of the loop have not been completed, and determining that the first prediction result is a jump.

Optionally, step 206 may further include substeps 2063-2064:

Sub-step 2063 resets the current iteration count value in the state information of the loop branch instruction to 0 if it is determined that the first prediction result is not to jump.

Sub-step 2064, in the event that the first prediction result is determined to be a jump, incrementing a current iteration count value in the state information of the loop branch instruction by a preset number of steps.

In an embodiment of the present application, for sub-steps 2063-2064, where the loop predictor obtains that the first prediction result of the loop branch instruction is non-skip, the loop predictor may consider that the loop branch instruction has performed all iterations in the loop, and the loop predictor may reset the current iteration count value in the state information of the loop branch instruction to 0 to characterize the end of the loop.

When the loop predictor obtains that the first prediction result of the loop branch instruction is jump, the loop predictor can consider that the loop branch instruction has not executed all iterations in the loop, and the loop predictor can increment the current iteration count value in the state information of the loop branch instruction according to the preset step number so as to represent that the loop is still continuing.

It should be noted that the loop predictor may implement two or more steps of prediction, so that the loop predictor may provide two or more loops of instructions to the back end at a time, thereby greatly improving the address supplying capability. The specific method is described below by taking the preset step number as n, when the loop predictor knows that the number of the remaining iterations to be performed is greater than n according to the state information of the loop branch instruction, the loop predictor increases the current iteration count value by n (n is generally an integer value such as 2, 3 and the like) during prediction, and meanwhile, the loop cache provides n parts of loop body instructions, so that multi-step branch prediction is realized and multiple parts of addresses are provided.

Optionally, in another implementation, the state information further includes: the confidence value of the loop branch instruction and the first label, the confidence value having a positive correlation with the accuracy of the first predicted outcome, step 206 may specifically include substep 2065:

sub-step 2065, obtaining a first prediction result by comparing the current iteration count value and the total iteration number if it is determined that the confidence value in the state information of the loop branch instruction is greater than the preset threshold and the second tag carried by the loop branch instruction is matched with the first tag.

In the embodiment of the present application, in order to improve accuracy of the prediction process, the state information of the loop branch instruction further includes: the confidence value of the circulating branch instruction and the first label, wherein the confidence value can reflect the prediction accuracy of the circulating predictor on the circulating branch instruction, and the higher the confidence value is, the higher the prediction accuracy is; the first tag is used for representing the identity of the loop branch instruction, and can be used for verifying the accuracy of the data of the loop branch instruction, namely, the loop branch instruction carries a second tag representing the identity.

Before prediction, whether a second tag carried by the loop branch instruction is the same as a first tag stored in state information of a cache unit of the loop predictor and a confidence value is larger than a preset threshold value or not can be compared, when the second tag is the same as the first tag and the confidence value is larger than the preset threshold value, data acquisition of the state information of the loop branch instruction is considered to be error-free, prediction precision can be ensured, and the current iteration count value and the total iteration times are compared to obtain a first prediction result. When the second tag is different from the first tag, the loop predictor is considered to have errors in data acquisition of the state information of the loop branch instruction, and prediction is not performed before the errors in data acquisition are not eliminated. And when the confidence value is smaller than or equal to a preset threshold value, the prediction accuracy of the loop predictor on the loop branch instruction is considered to be not up to the requirement, and the loop predictor is not predicted first.

Optionally, the method further comprises steps 207-209:

step 207, when the loop branch instruction is submitted, acquiring an actual execution result of the loop branch instruction, and recording a submitted iteration count value of the loop branch instruction in a cache unit.

Step 208, if the actual execution result is a jump, incrementing the committed iteration count value.

Step 209, setting the total iteration number of the loop branch instruction in the cache unit as the committed iteration count value if the actual execution result is not jump.

In the embodiment of the present application, for step 207-209, the submission of the loop branch instruction characterizes that the loop branch instruction has formally ended the lifecycle, and at this time, the processor has executed the loop branch instruction and obtained the actual execution result of the loop branch instruction, and the loop predictor may synchronize the submitted iteration count value of the loop branch instruction in the cache unit according to the actual execution result when the loop branch instruction is submitted.

For example, for a loop with a total number of iterations of 100, if the loop branch instruction executes in iteration 55 and completes commit and the actual execution result is a jump, the loop predictor records in the cache unit that the loop branch instruction has a committed iteration count of 55. If the loop branch instruction is executed and submitted in the 100 th iteration and the actual execution result is no jump, the submitted iteration count value of the loop branch instruction recorded in the cache unit by the loop predictor is 100, and the total iteration number in the state information of the loop branch instruction in the cache unit is set to be 100, so that the generation and maintenance of the total iteration number are realized through the above mode.

Optionally, the status information further includes: a confidence value for the loop branch instruction, the confidence value having a positive correlation with a correct rate of the first predicted outcome, the method further comprising steps 210-212:

step 210, under the condition that the current total iteration number is set, acquiring the last set total iteration number.

Step 211, incrementing the confidence value when the current total iteration number is the same as the last set total iteration number.

Step 212, decrementing the confidence value if the current total iteration number is different from the last set total iteration number.

In the embodiment of the present application, for steps 210 to 212, the confidence value is an important index reflecting the prediction precision of the loop predictor, and for maintenance of the confidence value, the embodiment of the present application may first obtain the total iteration number set last for the loop branch instruction under the condition that the current total iteration number of the loop branch instruction is set, and compare whether the current total iteration number is the same as the total iteration number set last. Under the condition that the current total iteration times are the same as the last set total iteration times, the performance of the loop predictor is considered to be better, and the confidence value can be increased; and under the condition that the current total iteration times are different from the last set total iteration times, the performance of the loop predictor is considered to be poor, and the confidence value is decremented.

FIG. 1 is a block diagram of a processing system for a loop branch instruction according to an embodiment of the present application, the system comprising: a loop predictor, a main predictor, and a multiplexer;

further, referring to fig. 4, fig. 4 is a block diagram of a cyclic predictor according to an embodiment of the present application, where the cyclic predictor includes: the device comprises a prediction unit, a recovery unit and a first cache unit;

In the embodiment of the application, the main predictor mainly performs branch prediction by using the history execution record of the loop branch instruction to obtain the second prediction result of the loop branch instruction. In addition, on the basis of the main predictor, a loop predictor can be further added to predict the branch direction of a loop branch instruction existing in a loop in a program, the loop predictor records the current iteration number (which iteration the loop is performed to) and the total iteration number of the loop branch instruction through counters, and the branch prediction of the loop branch instruction can be made by comparing the values of the two counters.

Further, the state information of the loop branch instruction may be maintained in the first buffer unit of the loop predictor, where the state information mainly reflects the execution condition of the loop branch instruction in the loop, such as the total iteration number included in the loop and the iteration number currently executed by the loop branch instruction, and the prediction unit is configured to predict the loop branch instruction to be predicted according to the state information of the loop branch instruction recorded in the first buffer unit, so as to obtain a first prediction result.

In addition, the embodiment of the application can optimize the processing time of error recovery more, particularly, when the processor obtains the actual execution result of the loop branch instruction, the processor starts the processing of error recovery, and because the actual execution result can be used for determining whether the prediction error exists at the moment, the instruction is not required to be submitted, so that the recovery unit is used for updating the state information of the loop branch instruction in the first cache unit according to the actual execution result and the target iteration count value contained in the error recovery request under the condition that the error recovery request for the loop branch instruction is received through the error recovery processing started earlier, and the error recovery of the loop branch instruction is completed.

Optionally, the status information further includes: the confidence value of the branch instruction is circulated, and the confidence value and the accuracy of the first predicted result are in positive correlation; in the case where the loop predictor is integrally provided in the main predictor, the multiplexer is particularly for: outputting the first prediction result when the confidence value is greater than a preset threshold value, and outputting the second prediction result when the confidence value is less than or equal to the preset threshold value.

In the embodiment of the application, the multiplexer may selectively output the first prediction result of the cyclic predictor or output the second prediction result of the main predictor, in general, a confidence value may be maintained in the cyclic predictor, where the confidence value and the accuracy of the first prediction result are in a positive correlation, and the multiplexer may selectively output the first prediction result if the confidence value is greater than a preset threshold, and may selectively output the second prediction result if the confidence value is less than or equal to the preset threshold.

Specifically, when the loop predictor is integrally provided in the main predictor and the confidence value is greater than the preset threshold, the first prediction result may be output to overlap the second prediction result.

Optionally, the status information further includes: the confidence value of the branch instruction is circulated, and the confidence value and the accuracy of the first predicted result are in positive correlation; in the case where the loop predictor and the main predictor are separately provided, the multiplexer is particularly for: and closing the main predictor and outputting a first prediction result when the confidence value is greater than a preset threshold value, and outputting a second prediction result when the confidence value is less than or equal to the preset threshold value.

Specifically, when the cyclic predictor and the main predictor are respectively and independently set and the confidence value is greater than a preset threshold, the main predictor can be turned off to save power consumption, and a first prediction result of the cyclic predictor is output, so that the effect of reducing the power consumption of a prediction link can be achieved.

Optionally, referring to fig. 4, the system further includes: an updating unit and a second buffer unit; the updating unit is used for: when the loop branch instruction is submitted, acquiring an actual execution result of the loop branch instruction, and recording a submitted iteration count value of the loop branch instruction in a second cache unit; if the actual execution result is jump, the submitted iteration count value is increased; and setting the total iteration number of the loop branch instruction in the first cache unit as the submitted iteration count value under the condition that the actual execution result is not jump.

In the embodiment of the application, the submission of the loop branch instruction characterizes that the loop branch instruction has formally ended the life cycle, at this time, the processor has executed the loop branch instruction and obtained the actual execution result of the loop branch instruction, and the updating unit of the loop predictor can synchronously process the submitted iteration count value of the loop branch instruction according to the actual execution result in the second buffer unit when the loop branch instruction is submitted.

For example, for a loop with a total number of iterations of 100, if the loop branch instruction executes in the 55 th iteration and completes the commit and the actual execution result is a jump, the committed iteration count of the loop branch instruction recorded by the update unit in the second cache unit is 55. If the loop branch instruction is executed and submitted in the 100 th iteration and the actual execution result is not jump, the submitted iteration count value of the loop branch instruction recorded in the second cache unit by the updating unit is 100, and the total iteration number in the state information of the loop branch instruction in the first cache unit is set to be 100, so that the generation and maintenance of the total iteration number are realized through the above mode.

Further, referring to FIG. 5, a schematic diagram of a loop branch instruction prediction process is shown, comprising: after the loop branch instruction is input, a branch S1 is predicted by a main predictor to obtain a second prediction result; the other branch S2-S7 is predicted by a loop predictor to obtain a first prediction result, and for the branch S2-S7, the loop predictor firstly executes S2 and receives a loop branch instruction; then, S3 is performed to determine whether the labels match (see the above substep 2065 for specific reference); executing S4, in case of tag matching, determining if the confidence value is greater than a preset threshold (see in particular sub-step 2065 above); executing S5, and judging whether the current iteration count value is the same as the total iteration number or not under the condition that the confidence value is larger than a preset threshold value; when the current iteration count value is the same as the total iteration number, outputting a first prediction result to be not jumped, and resetting the current iteration count value to be 0; when the current iteration count value is different from the total iteration number, outputting a first prediction result as a jump, and incrementing the current iteration count value. Finally, the multiplexer is used for selecting the first prediction result or the second prediction result to output.

FIG. 6 is a block diagram of an apparatus for processing a loop branch instruction according to an embodiment of the present application, the apparatus including:

a first obtaining module 301, configured to obtain an error recovery request for a loop branch instruction, where the error recovery request is sent by a processor after obtaining an actual execution result of the loop branch instruction, and the error recovery request includes the actual execution result and a target iteration count value; the target iteration count value reflects the number of iterations that the loop branch instruction has speculatively executed;

a second obtaining module 302, configured to obtain status information of the loop branch instruction from a preset cache unit; the state information reflects the execution condition of the loop branch instruction in the loop;

and a recovery module 303, configured to update the state information of the loop branch instruction in the first cache unit according to the actual execution result and the target iteration count value, so as to complete error recovery of the loop branch instruction.

Optionally, the status information includes: a current iteration count value of the loop branch instruction;

the recovery module 303 includes:

a first updating sub-module, configured to reset, when the actual execution result is no jump, a current iteration count value of state information of the loop branch instruction in the cache unit to 0;

And the second updating sub-module is used for modifying the current iteration count value into the target iteration count value under the condition that the actual execution result is a jump.

Optionally, the apparatus further includes:

the construction module is used for constructing state information corresponding to the loop branch instruction in a cache unit aiming at the loop branch instruction to be predicted, wherein the state information comprises: the current iteration count value and the total iteration number;

and the comparison module is used for obtaining a first prediction result by comparing the current iteration count value with the total iteration number.

Optionally, the comparing module includes:

a first comparing sub-module, configured to determine that the first prediction result is not skip when the current iteration count value is the same as the total iteration number;

and the second comparison sub-module is used for determining that the first prediction result is jump when the current iteration count value is different from the total iteration number.

Optionally, the comparing module further includes:

a first processing sub-module configured to reset a current iteration count value in state information of the loop branch instruction to 0 if it is determined that the first prediction result is not jump;

And the second processing sub-module is used for increasing the current iteration count value in the state information of the loop branch instruction according to a preset step number under the condition that the first prediction result is determined to be the jump.

Optionally, the status information further includes: the confidence value of the circulating branch instruction and a first label are in positive correlation with the accuracy of the first predicted result;

the comparison module comprises:

and the execution sub-module is used for obtaining a first prediction result by comparing the current iteration count value with the total iteration times under the condition that the confidence value in the state information of the loop branch instruction is larger than a preset threshold value and a second label carried by the loop branch instruction is matched with the first label.

Optionally, the apparatus further includes:

the submitting module is used for acquiring the actual execution result of the loop branch instruction when the loop branch instruction is submitted, and recording the submitted iteration count value of the loop branch instruction in the cache unit;

the increment module is used for incrementing the submitted iteration count value under the condition that the actual execution result is a jump;

And the setting module is used for setting the total iteration times of the loop branch instruction in the cache unit as the submitted iteration count value under the condition that the actual execution result is not jump.

Optionally, the status information further includes: the confidence value of the circulating branch instruction is in positive correlation with the accuracy of the first predicted result; the apparatus further comprises:

the historical information acquisition module is used for acquiring the last set total iteration times under the condition that the current total iteration times are set;

the first adjusting module is used for increasing the confidence value under the condition that the current total iteration times are the same as the last set total iteration times;

and the second adjusting module is used for decrementing the confidence value under the condition that the current total iteration times are different from the last set total iteration times.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Embodiments of the present application provide a processing device for loop branch instructions, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, comprising means for performing the methods described in one or more of the embodiments.

Fig. 7 is a block diagram of an electronic device 600, according to an example embodiment. For example, the electronic device 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 7, an electronic device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 may include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is used to store various types of data to support operations at the electronic device 600. Examples of such data include instructions for any application or method operating on the electronic device 600, contact data, phonebook data, messages, pictures, multimedia, and so forth. The memory 604 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 606 provides power to the various components of the electronic device 600. The power supply components 606 can include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 600.

The multimedia component 608 includes a screen between the electronic device 600 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense demarcations of touch or sliding actions, but also detect durations and pressures associated with the touch or sliding operations. In some embodiments, the multimedia component 608 includes a front camera and/or a rear camera. When the electronic device 600 is in an operational mode, such as a shooting mode or a multimedia mode, the front-facing camera and/or the rear-facing camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 610 is for outputting and/or inputting audio signals. For example, the audio component 610 includes a Microphone (MIC) for receiving external audio signals when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 614 includes one or more sensors for providing status assessment of various aspects of the electronic device 600. For example, the sensor assembly 614 may detect an on/off state of the electronic device 600, a relative positioning of the components, such as a display and keypad of the electronic device 600, the sensor assembly 614 may also detect a change in position of the electronic device 600 or a component of the electronic device 600, the presence or absence of a user's contact with the electronic device 600, an orientation or acceleration/deceleration of the electronic device 600, and a change in temperature of the electronic device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is utilized to facilitate communication between the electronic device 600 and other devices, either in a wired or wireless manner. The electronic device 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 616 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for implementing a method of translating assembly instructions as provided by embodiments of the application.

In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as memory 604, including instructions executable by processor 620 of electronic device 600 to perform the above-described method. For example, the non-transitory storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Fig. 8 is a block diagram of an electronic device 700, according to an example embodiment. For example, the electronic device 700 may be provided as a server. Referring to fig. 8, electronic device 700 includes a processing component 722 that further includes one or more processors and memory resources represented by memory 732 for storing instructions, such as application programs, executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. In addition, the processing component 722 is configured to execute instructions to perform an assembler instruction translation method provided by embodiments of the present application.

The electronic device 700 may also include a power supply component 726 configured to perform power management of the electronic device 700, a wired or wireless network interface 750 configured to connect the electronic device 700 to a network, and an input output (I/O) interface 758. The electronic device 700 may operate based on an operating system stored in memory 732, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program is executed by a processor to realize the assembly instruction translation method.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of processing a loop branch instruction, the method comprising:

and updating state information of the loop branch instruction in the cache unit according to the actual execution result and the target iteration count value, and completing error recovery of the loop branch instruction.

2. The method of processing a loop branch instruction according to claim 1, wherein the state information comprises: a current iteration count value of the loop branch instruction;

and updating the state information of the loop branch instruction in the cache unit according to the actual execution result and the target iteration count value, wherein the updating comprises the following steps:

resetting the current iteration count value of the state information of the loop branch instruction in the cache unit to 0 under the condition that the actual execution result is not jump;

and under the condition that the actual execution result is jump, modifying the current iteration count value into the target iteration count value.

3. The method of processing a loop branch instruction according to claim 1, wherein prior to said fetching of an error recovery request for a loop branch instruction, the method further comprises:

For a loop branch instruction to be predicted, constructing state information corresponding to the loop branch instruction in a cache unit, wherein the state information comprises: the current iteration count value and the total iteration number;

and comparing the current iteration count value with the total iteration number to obtain a first prediction result.

4. A method of processing a loop branch instruction according to claim 3, wherein said obtaining a first prediction result by comparing said current iteration count value and said total number of iterations comprises:

when the current iteration count value is the same as the total iteration number, determining that the first prediction result is not jumped;

and when the current iteration count value is different from the total iteration number, determining that the first prediction result is a jump.

5. The method of processing a loop branch instruction of claim 4, wherein the method further comprises:

resetting a current iteration count value in state information of the loop branch instruction to 0 if the first prediction result is determined to be non-jump;

and under the condition that the first prediction result is determined to be jump, increasing the current iteration count value in the state information of the loop branch instruction according to a preset step number.

6. A method of processing a loop branch instruction according to claim 3, wherein the state information further comprises: the confidence value of the circulating branch instruction and a first label are in positive correlation with the accuracy of the first predicted result;

the step of obtaining a first prediction result by comparing the current iteration count value with the total iteration number includes:

and under the condition that the confidence value in the state information of the loop branch instruction is larger than a preset threshold value and the second label carried by the loop branch instruction is matched with the first label, obtaining a first prediction result by comparing the current iteration count value with the total iteration times.

7. A method of processing a loop branch instruction according to claim 3, the method further comprising:

when the loop branch instruction is submitted, acquiring an actual execution result of the loop branch instruction, and recording a submitted iteration count value of the loop branch instruction in a cache unit;

if the actual execution result is jump, the submitted iteration count value is increased;

and setting the total iteration number of the loop branch instruction in the cache unit as the submitted iteration count value under the condition that the actual execution result is not jump.

8. The method of processing a loop branch instruction according to claim 7, wherein the state information further comprises: the confidence value of the circulating branch instruction is in positive correlation with the accuracy of the first predicted result; the method further comprises the steps of:

under the condition that the current total iteration times are set, acquiring the last set total iteration times;

increasing the confidence value under the condition that the current total iteration times are the same as the last set total iteration times;

and decrementing the confidence value under the condition that the current total iteration times are different from the last set total iteration times.

9. A system for processing a loop branch instruction, the system comprising:

a loop predictor, a main predictor, and a multiplexer;

10. The processing system of a loop branch instruction of claim 9, wherein the state information further comprises: the confidence value of the circulating branch instruction is in positive correlation with the accuracy of the first predicted result;

in case the loop predictor is integrated in the main predictor, the multiplexer is in particular adapted to: outputting the first prediction result when the confidence value is greater than a preset threshold value, and outputting the second prediction result when the confidence value is less than or equal to the preset threshold value.

11. The processing system of a loop branch instruction of claim 9, wherein the state information further comprises: the confidence value of the circulating branch instruction is in positive correlation with the accuracy of the first predicted result; in the case where the loop predictor and the main predictor are separately provided, the multiplexer is specifically configured to: and closing the main predictor and outputting the first prediction result when the confidence value is larger than a preset threshold value, and closing the cyclic predictor and outputting the second prediction result when the confidence value is smaller than or equal to the preset threshold value.

12. The system for processing a loop branch instruction of claim 9, wherein the system further comprises:

an updating unit and a second buffer unit;

the updating unit is used for:

when the loop branch instruction is submitted, acquiring an actual execution result of the loop branch instruction, and recording a submitted iteration count value of the loop branch instruction in the second cache unit;

and setting the total iteration number of the loop branch instruction in the first cache unit as the submitted iteration count value under the condition that the actual execution result is not jump.

13. The processing system of a loop branch instruction of claim 9, wherein the state information further comprises: the confidence value of the circulating branch instruction is in positive correlation with the accuracy of the first predicted result;

the system further comprises: a cyclic buffer unit; the loop buffer unit stores a loop branch instruction extracted from the instruction buffer;

the prediction unit is specifically configured to: and under the condition that the confidence value is larger than a preset threshold value, reading the loop branch instruction from the loop cache unit.

14. A processing apparatus for looping a branch instruction, said apparatus comprising:

15. An electronic device, comprising: a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 8.

16. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1 to 8.