CN106933537B - Detect the processor and method of modification program code - Google Patents

Detect the processor and method of modification program code Download PDF

Info

Publication number
CN106933537B
CN106933537B CN201710137889.1A CN201710137889A CN106933537B CN 106933537 B CN106933537 B CN 106933537B CN 201710137889 A CN201710137889 A CN 201710137889A CN 106933537 B CN106933537 B CN 106933537B
Authority
CN
China
Prior art keywords
instruction
storage element
ownership
cache line
overtime
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710137889.1A
Other languages
Chinese (zh)
Other versions
CN106933537A (en
Inventor
布兰特·比恩
柯林·艾迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhaoxin Semiconductor Co Ltd
Original Assignee
Shanghai Zhaoxin Integrated Circuit Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/156,403 external-priority patent/US9792216B1/en
Application filed by Shanghai Zhaoxin Integrated Circuit Co Ltd filed Critical Shanghai Zhaoxin Integrated Circuit Co Ltd
Publication of CN106933537A publication Critical patent/CN106933537A/en
Application granted granted Critical
Publication of CN106933537B publication Critical patent/CN106933537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

A kind of processor and method for detecting modification program code, the processor and method determine memory ownership according to cache line to detect the modification program code of the procedure code with the instruction for being overlapped in cache line range.The ownership index of each cache line is input into ownership queue with cache line address.Cache line is translated into instruction.The instruction that the cache line data that foundation is overlapped in two cache lines generate is set across vertical position.It collides in storage element in ownership queue and is set in any overtime position of storage instruction.When the overtime position of corresponding storage element is set, or when the instruction being suggested across vertical position with the overtime position of the next continuous storage element of storage element corresponding in ownership queue when being all set, the first exceptional cast is marked in the instruction being suggested.When labeled the first exceptional cast with generation of the instruction that will be exited, the first exceptional cast is performed.The present invention can be improved the efficiency of processor.

Description

Detect the processor and method of modification program code
Technical field
The present invention is associated with memory ownership, is especially associated with and determines memory ownership to detect based on cache line Modification program code.
Background technique
Modification program code (self modified code, SMC) has at least one instruction for being locally processed device execution To correct another instruction or the subsequent procedure code sequence being processed by the processor.Modification program code may have a sequence The procedure code of column is to correct the procedure code being just performed, so that being corrected and there is the procedure code of new function to be executed once again. In another example, modification program code is to correct procedure code sequentially immediately and just be performed before.Although reviewing one's lessons by oneself Positive procedure code is now and not as good as in the past generally, many old-fashioned programs still have modification program code and should be by execution appropriate. Processor allows for detecting modification program code and correction calculation to avoid unsuitable result." processor " used herein One word includes microprocessor (micro processor), central processing unit to represent any type of processing unit (central processing unit, CPU), an operation core or a microcontroller (micro controller) etc..Herein " processor " word used further comprises any type of processor architecture, such as is integrated with the chip of multiple processing units, Either contain the integrated circuit (integrated with a System on chip (system of a chip, SOC) circuit,IC)。
Modern processor is frequently performed pre- acquisition operation reading rows one or more in memory into instruction cache memory (icache).The cache line of instruction cache memory is resolved to instruct and be performed.In order to maximize efficiency, acquisition unit Either similar element can attempt to fill up instruction cache memory and the state filled up is maintained continuously to be supplied to ensure to instruct To execution.In order to maximize efficiency, execution pipeline (execution pipeline) is hoped to be able to maintain that fully loaded state. Modern processor is passed through to be executed frequently with out-of-order (out-of-order, OOO), that is to say, that evening receives but is ready for being performed Instruction can prior to it is early receive but be not ready be performed instruction and be performed.Pre- capture is asked at least one of random ordering operation Topic is may to be modified later by modification program code by pre- capture with the instruction for providing execution.Therefore, it has been provided and holds Capable instruction may miss amendment, and may cause the operation of inappropriate or non-original meaning.
Modern processor needs to detect or prevent overtime instruction and is completed, overtime instruction refers to modified by procedure code after It is not intended to the instruction being performed.The ownership of memory can be generally divided into an instruction area and a data area by processor, be referred to Memory cache is enabled to possess instruction area, data cache (data cache, dcache) possesses data area.Instruction area Domain is predetermined to be only storage to the instruction that executes, and be predetermined to be can be by the data and letter stored by software program for data area Breath is utilized.If instruction cache memory is attempted to read the memory that data cache is possessed, ownership must quilt The process converted, and converted from data cache will be slow and tediously long and make operation by tandem.
In previous framework, boundary of the ownership based on paging.The size of a usual paging is 4KB (kilobytes).Although the memory of 4KB does not occupy significant capacity, modification program code can generate instruction cache and deposit Ownership between reservoir and data cache is jolted (thrashing) phenomenon, and reduces operation efficiency.A kind of solution Method is the memory block of the 1KB in big as low as a quarter page, that is, the paging size of 4KB for reduce ownership.But Although only the ownership block of 1KB is still enough to cause trouble to modification program code in many cases.Moreover, bigger Paging size is also often used, and seems 2MB (megabytes) even 1GB (gigabyte), therefore for reducing overall efficiency For, ownership block is always an important subject under discussion.
Summary of the invention
It is more with being overlapped in detect to determine memory ownership based on cache line that the present invention provides a kind of processor The modification program code of the procedure code of the instruction of a cache line range, before processor has ownership queue, acquisition system, processing End, overtime detecting system and execution system.Acquisition system gives processing front end to provide the cache line data of a plurality of cache line, and Acquisition system to determine each cache line ownership index, acquisition system also to by ownership index and corresponding cache Line address inputs one of the storage element into ownership queue.Front end is handled to turn the cache line data of cache line Be translated into multiple instruction, each instruction that processing front end is generated to the cache line data that basis of design is overlapped in two cache lines across Vertical position.Processing front end handles front end for storage corresponding in ownership queue also to export each instruction to execute instruction The each instruction being suggested is added in the ownership index of unit.Overtime detecting system is to set the storage list in ownership queue It collides in member in any overtime position of storage instruction.And when the overtime position of corresponding storage element is set, or work as quilt The overtime across vertical position with next continuous storage element of the corresponding storage element in ownership queue of the instruction of proposition When position is all set, overtime detecting system is to the first exceptional cast of label in the instruction being suggested.When the instruction that will be exited When labeled the first exceptional cast with generation, execution system is to execute the first exceptional cast.
In one embodiment, the first exceptional cast executes system refresh processor to indicate, to avoid to indicate the The instruction of one exceptional cast is exited.And first exceptional cast to indicate that acquisition system captures again from instruction cache memory To indicate the instruction of the first exceptional cast.
In one embodiment, execution system is also to the destination address for each storage instruction for determining to be determined proposition.Exceed When detecting system there is comparator and overtime detector.Comparator to be input into the cache line address of ownership queue with The each destination address being determined is compared.When pairing result is found, comparator is to set overtime position.Overtime is detectd Corresponding storage element exceedes in the ownership index reading ownership memory cache for the instruction that survey device is suggested to foundation Shi Wei, and read the overtime position of next continuous storage element in ownership memory cache after corresponding storage element. And when the overtime position of next continuous storage element in the overtime position of corresponding storage element and ownership memory cache Any when being set, overtime detector is to keep the instruction being suggested labeled to generate the first exceptional cast.
In one embodiment, processing front end is according to pair in the ownership indexed access ownership queue for the instruction being suggested The storage element answered is to set the execution position of corresponding storage element, and when the instruction being suggested across vertical position when being set, if The execution position of next continuous storage element of fixed corresponding storage element.Execution system is to determine each storage being suggested Deposit the destination address of instruction.Overtime detecting system has comparator.When each destination address, which is performed system, to be determined, comparator Each cache line address to the effective storage element stored in more each destination address and ownership queue.Comparator is simultaneously To set the overtime position of matched each storage element.Matched each storage of the overtime detector to calculate comparator decision The execution position of memory cell, and when any execution position of matched any storage element is set, overtime detector makes to be associated with The storage instruction for the destination address being determined is labeled to generate the second exceptional cast.In this embodiment, when will exit Storage instruction is labeled to generate the second exceptional cast, and execution system is to execute the second exceptional cast.Second exceptional cast makes Execution system enables storage instruction labeled to generate the second exceptional cast, and the second exceptional cast makes to execute system refresh processing Device, the second exceptional cast make execution system obtain instruction pointer to capture the instruction after storage instruction from instruction cache memory.
In one embodiment, destination address of the execution system to each storage instruction for determining to be suggested.Overtime detecting System is for example with first comparator and the second comparator.First comparator, to being input into each of ownership queue Cache line address is compared with each destination address being determined, when match result be found when, first comparator to Set the overtime position for the storage element being entered.When each destination address, which is performed system, to be determined, the second comparator is to right The cache line address of each effective storage element stored in each destination address and ownership queue is compared, and the second ratio Overtime position compared with device also to set each matched storage element.Processor is deposited according to the ownership index for the instruction being suggested The corresponding storage element in ownership queue is taken, to set the execution position of corresponding storage element.Processor foundation is suggested Instruction ownership indexed access ownership queue in corresponding storage element, to set the execution of corresponding storage element Position.When the instruction being suggested across vertical position when being set, the next continuous storage being set in after corresponding storage element The execution position of unit.
The present invention also provides a kind of to determine memory ownership according to cache line to detect to have and be overlapped in multiple cache lines The method of the modification program code of the procedure code of the instruction of range, comprising: capture a plurality of cache line, each cache line has cache Line address and cache line data determine the ownership index of each cache line obtained;By each cache line address with it is corresponding Ownership index is inputted into one of the storage element in ownership queue;It is multiple for translating the cache line data of cache line Instruction;Basis of design be overlapped in each instruction caused by the cache line data of two cache lines across vertical position;It is proposed each instruction To execute, and by ownership index with corresponding across each instruction for being suggested of vertical position addition;Set the storage in ownership queue It collides in memory cell in any overtime position of storage instruction;When the overtime position of corresponding storage element is set, or when being mentioned Instruction out across vertical position and the overtime position of next continuous storage element of corresponding storage element in ownership queue all When being set, the first exceptional cast of label is in the instruction being suggested;And when the instruction that will be exited is labeled to generate first When exceptional cast, the first exceptional cast is executed.
In one embodiment, the step of executing the first exceptional cast for example also has, and avoids to generate first case foreign affairs The instruction of part is exited, and refresh process device captures the instruction to generate the first exceptional cast again.
In one embodiment, the method for example also has, and determines the destination address of each storage being suggested instruction. Compare the cache line address for being input into ownership queue and each destination address being determined.When matching result is found When, set the overtime position for the storage element being entered.Ownership index according to the instruction being suggested reads ownership cache and deposits The overtime position of corresponding storage element in reservoir, and when the instruction being suggested across vertical position when being set, also foundation is suggested The ownership index of instruction reads next continuous storage element in ownership memory cache after corresponding storage element Overtime position.When corresponding storage element overtime position and ownership memory cache in next continuous storage element When being set, the instruction being suggested is marked to generate the first exceptional cast.
In one embodiment, the method for example also has, the ownership indexed access institute according to the instruction being suggested It has the right the execution position of corresponding storage element and the corresponding storage element of setting in queue.And when the instruction being suggested is across vertical position When being set, the execution position of next continuous storage element after setting corresponding storage element.Decision is suggested each Store the destination address of instruction.When each destination address is determined, more each destination address and having in ownership queue Each cache line address of storage element is imitated, and sets the overtime position of matched each storage element.When matched any storage When the execution position of unit is set, the storage that label corresponds to the destination address being determined is instructed to generate the second exceptional cast. When labeled the second exceptional cast with generation of the storage instruction that will be exited, the second exceptional cast is executed.
In one embodiment, the method for example also has, and determines the destination address of each storage being suggested instruction. Compare each cache line address for being input into ownership queue and each destination address being determined, and works as matching result When being found, the overtime position for the storage element being entered is set.It is more each when each destination address, which is performed system, to be determined Destination address and each cache line address for being stored in effective storage element in ownership queue, and set matched each storage The overtime position of memory cell.The method for example also has, the ownership indexed access ownership team according to the instruction being suggested Corresponding storage element in column, and set the execution position of corresponding storage element.And being set across vertical position when the instruction being suggested Periodically, the execution position of next continuous storage element after setting corresponding storage element.The method for example also has, When the execution position of a matched wherein storage element is set, label correspond to be determined destination address storage instruction with Generate the second pending exceptional cast.When labeled the second exceptional cast with generation of the storage instruction that will be exited, execute Second exceptional cast
The present invention can be improved the efficiency of processor.
Detailed description of the invention
By narration below and schema, benefit of the invention, feature and advantage can more preferably be understood.
At one in conjunction with an ownership queue of the Fig. 1 to establish ownership of the data between instruction according to an embodiment implementation Manage the simplification function block diagram of device.
Fig. 2 is that the ownership queue in Fig. 1 according to an embodiment implementation has relative to other ownership processing modules Interface one simplify function block diagram.
Fig. 3 is the flow diagram according to the operation of the processing front end of Fig. 1 in an embodiment.
Fig. 4 is the flow diagram according to ownership and exceptional event handling in an embodiment.
Fig. 5 is according to executing in an embodiment, exit flow diagram with exceptional event handling.
Wherein, symbol is simply described as follows in attached drawing:
100: processor;101: ownership queue;102: system storage;103: pre- acquisition module;104: processing front end; 105: instruction cache memory;106: executing system;107: acquisition module;109: decoder;111: round-robin queue;113: circulation Detector;115: instruction translator;117: register alias table;118: microoperation;119: branch's detector;121: reordering slow Rush device;123: scheduler;125: execution unit;127: storage queue;129: storage pipeline;130: data cache; 131: other elements;135: exiting module;137,139: overtime detects comparator;141: overriding detector;143,145: overtime Detector;CA: cache line address;DA: destination address;EXB: execution position;L, T1, T2: field;IP: instruction pointer;OWNI: institute It has the right to index;SDB: across vertical position;STB: overtime position;UOP, UOPX: microoperation;WB: winding position.
Specific embodiment
Inventor has found the problem of memory ownership as caused by modification program code.They have developed according to The ownership queue of memory ownership is established according to cache line to detect modification program code.
Fig. 1 is the simplified function block diagram that processor 100 combines an ownership queue (OWNQ) 101.Ownership Queue 101 according to an embodiment and implementation with establish data and instruction between ownership.The standard instruction set framework of processor 100 (instruction set architecture, ISA) can be macro (macro) framework of an x86.This x86 macro architecture can be with The most application program for being designed to be implemented in an x86 processor is appropriately carried out.The expected knot of one application program When fruit is obtained, application program is performed correctly at last.Especially, processor 100 executes the instruction in x86 instruction set, and With the visual buffer collection of x86 user.But the present invention is not restricted to x86 framework, processor 100 can be according to this field Other interchangeable instruction set architectures that those of ordinary skill understands.As shown, processor 100 couples external system storage Device 102.External system memory 102 is managed to store software program, application program, data and those of ordinary skill in the art Other data of solution.Processor 100 can have a Bus Interface Unit (bus interface unit, BIU) or similar Element (not being painted) is with coupling system memory 102.In the framework of a System on chip, processor 100, system storage 102 A shared integrated circuit can be incorporated into other processing function modules (not being painted).
Processor 100 has a processing system.Processing system have processing front end 104 and execute system 106 and other In the processing module of subsequent explanation.There is an information to capture (PREFETCH) engine 103, an instruction cache in advance for processing front end 104 105, one acquisition unit 107 of memory (ICACHE), a decoder 109,111, one instruction translator of a round-robin queue (LQ) (XLATE) 115, one register alias table (RAT) 117 and a branch predictor 119.Execution system 106 generally has one to reset 121, one scheduler 123 (also known as reservation station) of sequence buffer (ROB), execution unit 125 and a storage queue 127.Execution unit 125 have at least one storage pipeline 129 and other execution units 131.Execution unit 131 is, for example, one or more integers (INT) unit, one or more floating number (or media) units or at least one load pipeline.In one embodiment, load pipeline with Storage pipeline can be incorporated into a memory order buffer (MOB) (not being painted) or similar element.Store pipeline 129 It can be also coupled to a data cache (DCACHE) 130.Data cache 130 has the data of one or more ranks Memory cache, for example, a first level (L1) memory cache or a second level (L2) memory cache etc..Number System storage 102 can be also coupled to according to memory cache 130.As shown, resequencing buffer 121 also has one to exit mould Block 135, correlative detail asks Rong Houzai to chat.
Other ownership logical AND circuits are provided together together with ownership queue 101, with carry out ownership determine with Detect modification program code.The introduction of correlative detail is carried out below.Other ownership logical AND circuits have one the One overtime detects comparator (STALE DETECTCOMPARATOR1) 137, one second overtime and detects comparator (STALE DETECTCOMPARATOR2) 139, one overriding detector 141, one first overtime detector (STALEDETECTOR1) 143 and one Second overtime detector (STALE DETECTOR2) 145.
In general operation, the pre- engine 103 that captures is from described in 102 capturing program information of system storage and storage Information is into the cache line of instruction cache memory 105.Each cache line can have a preset length.The preset length For example, 64 bytes (byte).The size of cache line can be arbitrary and can be different under other frameworks.It picks It takes unit 107 to obtain each cache line from instruction cache memory 105 and provides cache line data to decoder 109 with will be described Data be parsed into command information.Cache line data are divided and are formatted into instruction and correspond to the letter of instruction by decoder 109 Breath, such as operand or similar information.For example, described in the case where processor 100 supports x86 instruction set architecture Instruction be, for example, x86 instruction.Referring herein to each instruction set architecture be, for example, a macro-instruction or propped up according to processor 100 One macro operation of the instruction set held.Macro operation provided by decoder 109 is then added into round-robin queue 111, and is provided to Instruction translator 115.Each macro operation is translated into one or more corresponding microcommands or microoperation by instruction translator 115 (micro operations,uop).The microcommand or microoperation are formed according to the native instruction set layout of processor 100. When each microoperation is provided to resequencing buffer 121, an instruction pointer (IP) is also determined and together with each microoperation It is provided.Microoperation is provided to register alias table 117.Register alias table 117 is to the program according to each microoperation Sequence, operand source or renaming information, generate the interdependent information of each microoperation.
Each microoperation (together with associated information) from register alias table 117 is injected towards according to program sequence Resequencing buffer 121, and it is injected towards scheduler 123.Scheduler 123 have at least one queue, the queue to Store each microoperation and its interdependent information received from register alias table 117.When microoperation is ready for being performed, The microoperation that 123 scheduled reception of scheduler arrives is to corresponding execution unit 125.Storage microoperation is provided to storage pipeline 129 To be handled, and every other instruction type is provided to unit appropriate (such as the integer in other execution units 131 Instruction is provided to Integer Execution Units, and Media instruction is provided to media execution unit, etc.).When all dependence relations It is solved, a microoperation is considered as being ready for executing.Together with a microoperation is dispatched, register alias table 117 will be weighed One storage element of order buffer 121 is arranged to the microoperation.Therefore, the microoperation is assigned by program sequence Into resequencing buffer 121.Resequencing buffer 121 is for example arranged into a round-robin queue, to ensure the microoperation It is exited according to program sequence.Corresponding instruction pointer is also supplied to weight together with corresponding interdependent information by register alias table 117 Order buffer 121, instruction pointer is stored in together with corresponding interdependent information store the storage operand of microoperation with As a result storage element.In one embodiment, an individual physics buffer heap (PRF) (not being painted) can be included in Come.One or more physics buffers in physics buffer heap can also be distributed or be mapped to each by register alias table 117 A microoperation, to store operand and result.
The result of execution unit 211 is for example passed back to resequencing buffer 121.Resequencing buffer 121, which updates, to be corresponded to Field and/or more new architecture buffer (architectural register) or similar element.In a physics buffer In the embodiment of heap, resequencing buffer 121 has pointer, and pointer is to corresponding buffer in more new physics buffer heap. In one embodiment, framework buffer is mapped to the physics buffer in physics buffer heap by register alias table 117, and more Correspond to the pointer or other similar information (not being painted) of microoperation in new resequencing buffer 121.Resequencing buffer 121 In pointer be for example updated in commission or after execution, and pointer in operation more new physics buffer heap it is temporary Content in storage.The module 135 that exits in resequencing buffer 121 finally enables microoperation exit according to procedure code sequence, with Ensure that operation appropriate is consistent with the instruction of software program or application program script.Either indicate have when a microoperation is labeled When one exceptional cast, module 135 is exited according to the type of exceptional cast and takes action appropriate.Correlative detail is see following detailed It states.
Storage pipeline 129 is injected towards to carry out pair that the storage microoperation of operation is also added into storage queue 127 The storage element answered.When being initially added from register alias table 117, the address for storing the operand of microoperation may It is not known.The address for storing the operand of microoperation includes destination address (DA).When the storage decision of pipeline 129 is performed One storage microoperation destination address, storage pipeline 129 provide destination address to storage queue 127 in corresponding storage list Member.
Branch predictor 119 detects branch's macro operation output being provided by decoder 109 and/or in round-robin queue 111, And whether branch predictor 119 is used according to branch and generates branch prediction results.Branch predictor 119 and acquisition unit 107 are communicated.Acquisition unit 107 can branch to different according to branch prediction results in instruction cache memory 105 Position.Acquisition unit 107 is also communicated each other with the pre- engine 103 that captures.Therefore, when branch location is not on instruction cache When in memory 105, pre- acquisition engine 103 obtains corresponding position from system storage 102, is stored with inputting into instruction cache Device 105.
In normal operation, the macro operation from decoder 109 is buffered and is provided to via round-robin queue 111 Instruction translator 115.Judge that the instruction in circulation is repeatedly pulled over when recycling detector 113, for example whole positions of the circulation It is either at least partially disposed at round-robin queue 111 in round-robin queue 111, circulation detector 113 identification one recycles, in the circulation Instruction be repeatedly removed from instruction cache memory 105 from being removed in round-robin queue 111.In an embodiment In, when the circulation of a preset quantity, which is pulled over, to be occurred, circulation detector 113 detects a circulation.In a specific embodiment, Number of pulling over is 24, but other numbers of suitably pulling over can also be used.In one embodiment, circulation detector 113 is assumed Circulation can be unlimited continue, therefore recycle detector 113 and continue duplicate loop computation (loop branches are not until prediction is incorrect It is used), at this point, system is refreshed, and the beginning of acquisition unit 107 is next after the circulation of instruction cache memory 105 A position (or may be another branch location) obtains information.
In the case where recycling detector 113 and having detected a circulation, acquisition unit 107 can constantly be obtained and by cache Line is added to the buffer of decoder 109, and until buffer is filled, and capturing operation can temporarily stop.In an embodiment In, when circulation detector 113 detects a circulation, acquisition unit 107 repeatedly obtains the cache line in circulation.In another reality It applies in example, acquisition unit 107 can be notified circulation detector 113 and detect a circulation, and acquisition unit 107 can start to read Data outside circulation.For example, acquisition unit 107 can start to read next continuous position of circulation.No matter which situation In, in a circulation carries out, decoder 109 can be filled.
When decoder 109 is added in cache line data by acquisition unit 107, acquisition unit 107 is also by corresponding cache line The storage element in ownership queue 101 is added for address (CA) and to mark this storage element be effective.Ownership queue 101 Can be organized into cyclic buffer or similar structure, ownership queue 101, which can have, to be added pointer and release pointer with area The storage element not being assigned and the storage element being deallocated.In another embodiment, in ownership queue 101 Each storage element has a significance bit or a virtual value to distinguish effective storage element and invalid storage element.Wherein, each The significance bit for being added into the new storage element of ownership queue 101 is set.In one embodiment, acquisition unit 107 determines One ownership indexes (OWNI) and the winding position (wrap) (WB).Ownership index corresponds to the cache line of cache line with winding position Address, and corresponding ownership index value be added together with together with cache line address with winding place value it is right in ownership queue 101 The storage element answered.Ownership index uniquely defines each storage element in ownership queue 101.Position quilt is wound herein To detect the movement of the overriding in ownership queue 101.
Register alias table 117 is to identify last micro- behaviour in each cache line according to corresponding ownership index Make, and the microoperation to mark the cache line is the microoperation of the last one, so that this information is provided to and reorders Buffer 121.When exiting module 135 and exiting a microoperation, exits module 135 and determine whether the microoperation being rejected is marked Note is the last one microoperation for the cache line being given in ownership queue 101.If so, it is all to exit the instruction of module 135 Power queue 101 releases corresponding storage element or keeps the corresponding storage element in ownership queue invalid.
When each new cache line address is added into a storage element in ownership queue 101 acquisition unit 107, Cache line address is also supplied with the input terminal of the first overtime detecting comparator 137.Overtime detects comparator 137 also from storage team Each effective destination address (DA), and more each destination address and new cache line address are read in column 127, to determine to be It is no to have the person of matching.Overtime detecting comparator 137 can be considered as a kind of comparator of new storage element.When cache line address with Any destination address matches, and a corresponding overtime position for the storage element in ownership queue 101 is set.Overtime position One storage microoperation of STB instruction and cache line are hit each other, and also that is, storing instruction, modified cache line either stores instruction Cache line will be modified.When a storage instruction is hit with the cache line for being stored in the effective storage element of ownership queue 101 1 It each other or collides with one another, any instruction generated according to this cache line can be invalid.When overtime position, STB is set, Any microoperation from the cache line may be invalid (namely overtime).
Ownership index value is also added to or is associated with to the corresponding cache line number provided to decoder 109 with winding place value According to.A corresponding winding place value and ownership index value of the decoder 109 with each macro operation are by decoder to identify 109 obtain from the corresponding cache line of which macro operation.It is same to wind when multiple macro operations are taken out from same cache line Position is assigned to each macro operation from same cache line with ownership index.In one embodiment, macro operation not with When cache line alignment in data cache 105, each macro operation also has one across vertical position SDB.Across vertical position SDB to know Not Chu a macro-instruction across two different cache lines are stood on the case where.That is, a macro-instruction starts from a wherein cache Line simultaneously ends at next continuous cache line.When this occurs, the ownership of first line is added in decoder 109 Index and set macro operation across vertical position to be true.When macro operation is included in a single cache line, vacation is set to across vertical position. When being added into instruction translator 115, each macro operation has corresponding winding position, ownership index and across vertical position.When one When a position or a field are set to true or false, the position or field (having at least one position) are set to logical one To be set as true, and logical zero is set to be set as false.
Each macro operation is translated into one or more microoperations by instruction translator 115.In the process of translation, You Yihong Each microoperation that operation generates equally have with from macro operation as winding place value, ownership index value with across vertical Place value.Therefore, when a macro operation is translated into three other microoperations, in three microoperations it is each have and originally The identical winding place value of macro operation, ownership index value with across vertical place value.When being transferred through register alias table 117, twine Each microoperation is retained in around place value, ownership index value and still across vertical place value.
One exemplary microoperation uopx it is shown in Figure 1 118, and demonstration microoperation uopx is to by register alias table 117 release and to be added into resequencing buffer 121 and scheduler 123 be that any one is micro- defined in processor 100 Operation.Each microoperation has multiple fields in order to the operation of corresponding microoperation or executes by the execution system of processor 100 Performed by system 106.One or more fields (not being painted) are to identify specific instruction and instruction type and its associated operation Member, such as constant operand, address, storage location and buffer index etc..Other fields are provided to store instruction Pointer IP, winding position WB, ownership index OWNI and across vertical position SDB.As explained below, each microoperation also has field T1 To indicate the exceptional cast of the first kind, each microoperation has field T2 also to indicate the exceptional cast of Second Type, each Whether it with indicator is by the other table 117 of buffer labeled as the final injunction in cache line that microoperation also has field L.
When each microoperation is released from register alias table 117 and is added into resequencing buffer 121 and scheduler 123 When middle, register alias table 117 is corresponding in the index value access ownership queue 101 according to the ownership index OWNI of microoperation Storage element, and set an execution position EXB in the storage element of taking-up.What it is when microoperation is very to indicate it across vertical position When for across vertical instruction, register alias table 117 sets the execution position of next continuous storage element in ownership queue 101. In addition to this.The execution position of storage element is to the hit after detecting the storage microoperation that one is not detected as overtime.
When each microoperation is exported from register alias table 117, overriding detector 141 is had by microoperation Ownership index value access ownership queue 101 in corresponding storage element, and override detector 141 had by microoperation Some ownership index values read the winding place value for the storage element being removed.When microoperation winding place value and do not match institute When the winding place value for the corresponding storage element having the right in queue 101, an overriding actuation once occurred, and override detector 141 The position T1 (label field T1 is true) of microoperation is set to indicate the exception thing of one first exceptional cast or the first kind Part is performed when microoperation is rejected.In one embodiment, before microoperation is added into resequencing buffer 121, position T1 in Microoperation can be written when being suggested detector 141 setting.In another embodiment, when being injected towards resequencing buffer When 121 or after being injected towards resequencing buffer 121, the position T1 of the storage element in resequencing buffer 121 is written Detector 141 sets or is reordered instruction setting of the buffer 121 according to overriding detector 141.One winding position is not With the overriding being indicated generally in circulation initiation ownership queue 101, so that corresponding cache line no longer detects self-correction Procedure code.It exits module 135 and detects T1 and be set to indicate the micro- of microoperation storage element corresponding to resequencing buffer 121 Operation is marked as the exceptional cast of the first kind.Overriding means that a storage element in ownership queue 101 is written And make the modification program code for being associated with corresponding cache line that may become detect and lead to incorrect result.More into one It walks for ground, the exceptional cast of the first kind has refreshed machine to prevent incorrect situation.
When each microoperation is exported from register alias table 117, the first overtime detector 143 is according to microoperation institute Corresponding storage element in the ownership indexed access ownership queue 101 having, and the first overtime detector 143 is according to micro- behaviour The index of ownership possessed by making reads the overtime position of the storage element taken out.When microoperation is true across vertical position, first exceedes When detector 43 also read the overtime position STB of next continuous storage element in ownership queue 101.When ownership queue This overtime position STB in 101 be true or microoperation as across shown in vertical position SDB to be instructed and in ownership queue 101 across vertical Next continuous storage element overtime position be it is true, then the first overtime detector 143 by set field T1 be very (or By setting position T1) to mark microoperation at the exceptional cast of the first kind.Overtime detector 143 can refer to when instruction When being suggested, to detect the submission overtime detector of possible illegal command.Such as override the example of detector 141, field T1 Before the storage element being added into resequencing buffer 121, in the storage being added into resequencing buffer 121 It is set to very, can be by the first overtime when unit or after the storage element being added into resequencing buffer 121 Detector 143 or it is set as true by resequencing buffer 121.As earlier mentioned, overtime position STB is to indicate cache line by one Microoperation modification is stored, therefore it may be invalid for instructing.
Whenever store pipeline 129 generate a destination address (DA) to it is corresponding one storage microoperation when, destination address in addition to It is provided to update in storage queue 127 except a corresponding storage element, destination address is also provided to the second overtime detecting ratio Compared with an input terminal of device 139.Overtime detecting comparator 139 is with also accessing all effective cache line addresses and the new target of comparison Location each effective cache line address in ownership queue 101.Overtime detecting comparator 139 can be used as the ratio of fresh target address Compared with device.When have one match result when, overtime detect comparator 139 set ownership queue 101 in corresponding storage element Overtime position be true.In addition, when overtime detecting comparator 139 detected one match result when, corresponding ownership index It is provided to an input terminal of the second overtime detector 145.Overtime detector 145 accesses corresponding storage in ownership queue 101 Memory cell, and read the execution position EXB in this storage element.When the execution position EXB of storage element is set to very, then overtime is detectd Surveying device 145 makes the storage microoperation storage element in resequencing buffer 121 be marked as the second exceptional cast type either The exceptional cast of Second Type, this is by setting the field T2 of storage element to be true.Overtime detector 145 can be used as to detect Survey overtime detector in the execution of possible illegal command just in execution.Overtime detector 145 can be directly accessed and reorder Storage microoperation storage element in buffer 121 can indicate resequencing buffer to set T2 or overtime detector 145 121 to set T2.
After the exemplary microoperation 118 of the specific microoperation uopx of one be associated in resequencing buffer 121 simplifies It is shown in Fig. 1.Each other microoperation storage element has field T1 to indicate the exceptional cast of the first kind, and each A other microoperation storage element has field T2 to indicate the exceptional cast of Second Type, each other microoperation storage is single Member have field L with indicator whether be the cache line marked by register alias table 117 the last one microoperation instruction. When the last one microoperation that microoperation is a cache line, field L is set to very, on the contrary then field L is set to vacation.It exits Module 135 detects the field T1 and field T2 of the storage element of the microoperation of resequencing buffer 121, and exits module 135 and hold Row initializes corresponding exception routine (routine) either program.However, including any microoperation for storing microoperation It may be all marked as the exceptional cast of the first kind, but only storage microoperation can be marked as the exception thing of Second Type Part.
It exits module 135 and detects each microoperation in when being ready to exit, such as when microoperation is resequencing buffer In 121 when oldest instruction.When a microoperation is ready to exit, the storage that module 135 also detects corresponding microoperation is exited Field T1, field T2 and field L in memory cell.When the field T1 of a microoperation is true, exits module 135 and generate the first kind The exceptional cast of type gives the microoperation, and when field T2 is true, exits the exceptional cast that module 135 generates Second Type To the microoperation.When it is true that field T1 and field T2, which are false and field L, exits module 135 and indicate ownership queue 101 It releases corresponding storage element in ownership queue 101 or makes the storage element invalidation in ownership queue 101, And complete cache line is efficiently removed from ownership queue 101.
When microoperation (will namely exit) oldest in resequencing buffer 121 is indicated as the exception of the first kind Event, resequencing buffer 121 broadcasts a corresponding exceptional cast signal in processor 100, and processing system is refreshed.? Under such circumstances, any macro operation and microoperation in execution pipeline is described including causing by efficiently invalidation The microoperation of exceptional cast.When the exceptional cast of the first kind occurs, all microoperations that do not exit are refreshed, including storage Any storage microoperation that do not exit in queue 127.The storage microoperation exited still persistently rests in storage queue 127, directly Memory architecture (such as data cache 130 and/or system storage 102) is submitted to its data.Lead to the first kind The microoperation of the exceptional cast of type is not allowed to exit, and microoperation is recorded in corresponding finger in buffer reorder buffer 121 Enable pointer that can be used to address of the access microoperation in instruction cache memory 105.It is pre- to capture engine 103 and acquisition unit 107 temporary stop.Processor 100 interrupts the exception in a microprogram code read only memory (not being painted) of processor 100 Routine, and type of the corresponding exception procedure code to indicate exceptional cast.When processing system is refreshed, exception routine is taken It call instruction pointer and instruction pointer is transmitted to acquisition unit 107 is associated with the macro of the microoperation for leading to exceptional cast to capture again Operation.
Store the exception of the similar first kind in other kinds of microoperation of exceptional cast of the Second Type of microoperation Event.In this case, storage microoperation is allowed to exit, and storage microoperation is made to complete its operation and update its destination address Pointed memory location.Possessed because memory location is first commanded memory cache 105, and storing microoperation is to need The data operation that data cache 130 in device 100 to be processed is possessed, therefore monitoring (snoop) unit is first Beginningization is so that corresponding cache line invalidation in instruction cache memory 105.Memory amendment is ensured that with invalidation can be in example Occur when outer event.The similar exceptional cast in the first kind, the exception routine of the exceptional cast of Second Type refresh machine, and Access and transmitting instruction pointer are to acquisition unit, to restart in the position.Because leading to the exception thing of Second Type The storage microoperation of part is allowed to complete, and instruction pointer is increased to storage microoperation after instruction cache memory 105 Next instruction, and operation is continued by the position after storage instruction.
Fig. 2 is that the ownership queue 101 in Fig. 1 according to an embodiment implementation has corresponding to other ownership processing mould The one of the interface of block simplifies function block diagram.Ownership queue 101 has multiple storage elements.Each storage element has a column Position WRAP is to store winding position.Each storage element has a field OWNI to store a corresponding index value.Each storage is single Member has one to execute field to store a corresponding execution position.Each storage element has an effective field to store significance bit. Each storage element has a cache line address field to store corresponding cache line address.Each storage element has one to exceed When field to store corresponding overtime position.
In one embodiment, ownership index is a count value.When each storage element is added into ownership queue 101 When, the count value increases.In order to ensure the ownership index of each storage element in ownership queue 101 is only with one The digit B of special index value, ownership index corresponds to the number N of the storage element in ownership queue 101, such as 2B≥N。 In one example, as shown in Fig. 2, the quantity of the storage element in ownership queue 101 be N=32, and ownership index Position is 5.In one embodiment, acquisition unit 107 determines winding position in a similar manner, winds one that position is indexed as ownership Additional most significant bit.In this case, when ownership index count down to a maximum value from 0, winding position is 0b (b To represent a binary digit), wherein total number of the maximum value to indicate the storage element in ownership queue 101.Work as institute When having the right to be reset to 0 and be added to maximum value again, and winding position is 1b.In other words, each for ownership queue 101 Secondary complete transmitting (pass), winding position WB are switched between two values.For ownership index digit B, storage element Sum can be less than the number of storage element maximum possible.For example, for the storage element that total quantity is 26, first Secondary when pulling over (OWNI adds up from decimal 0 to decimal 25, and WB is 0) WB | and OWNI is from 0 | and 00000b counts up to 0 | 11001b.Then, in pulling over for the second time (it is 1 that OWNI, which counts up to decimal 25 and WB from decimal 0), from 1 | 00000b Count up to 1 | 11001b.It is subsequent to be repeated according to above-mentioned mode.
As earlier mentioned, a new cache line address CA is inserted into cache line address field by acquisition unit 107, and is arranged effective Corresponding significance bit in field, and determine that ownership corresponding with insertion is indexed to field OWNI, and determine corresponding with insertion Wind position WB to field WRAP.The cache line address being newly added into is provided to an input terminal of overtime detecting comparator 137. Overtime detects comparator 137 and also receives destination address DA from storage queue 127.When new cache line address and from storage queue Any destination address between when having the result to match, corresponding overtime position is set to very in overtime field.When each micro- When operation is proposed from register alias table 117, it is according to corresponding in the ownership indexed access ownership queue 101 of microoperation Storage element to set the corresponding execution position EXB of storage element.In addition, when microoperation across vertical position be set to indicate one across When vertical microoperation, register alias table 117 accesses next storage element in ownership queue 101 and sets the storage element Corresponding execution position.When the last microoperation of a cache line is exited, buffer reorder buffer 121 accesses ownership queue Corresponding storage element and resetting or removing significance bit in 101.
Storage pipeline 129 determines the destination address of each storage microoperation and stores destination address in storage queue 127 Corresponding storage element.Destination address is also supplied with the second overtime detecting comparator 139.Second overtime detects comparator 139 Cache line address CA is accessed from ownership queue 101.When the destination address being newly determined is matched from ownership queue 101 When any one of cache line address, overtime detects the corresponding storage element in the setting ownership queue 101 of comparator 139 Overtime position.In addition, the index value of matched ownership index is provided to the second overtime detector 145.The detecting of second overtime Device 145 is according to corresponding storage element in the access ownership queue 101 of ownership index value to access corresponding execution position EXB. When the execution position of storage element corresponding in ownership queue 101 is set to very, overtime detector 145 is by resequencing buffer The storage element label (or enabling it labeled) of the storage microoperation to conflict in 121 is at the exceptional cast of Second Type.
As earlier mentioned, the microoperation that overriding detector 141 is proposed from register alias table 117 receives winding place value and owns Index value is weighed, and overrides detector 141 and is twined according to corresponding storage element access of the ownership index from ownership queue 101 Around place value.When the winding position WB of storage element corresponding in ownership queue 101 does not match the winding position of microoperation, overriding is detectd Device 141 is surveyed by microoperation label (or enabling it labeled) into the exceptional cast of the first kind.In addition, the first overtime detector 143 microoperations proposed from register alias table 117 receive ownership index values with across vertical place value, and the first overtime detector 143 access the overtime position of corresponding storage element in ownership queue 101 according to ownership index value.When by buffer alias In the microoperation that table 117 proposes is very with instruction one across vertical microoperation across vertical position, then overtime detector 143 accesses ownership team The overtime position of next continuous storage element in column 101.When any one overtime position for the storage element being accessed is set It is set to very, microoperation is marked (or enabling it labeled) at the exceptional cast of the first kind by overtime detector 143.
One first storage element is shown in the top of ownership queue 101, and the first storage element has winding a position WB, one Ownership index, an execution position EXB, a significance bit, a corresponding cache line address CA_33 and an overtime position STB.Wherein, it twines Around position WB=1b.The index value of ownership index is 00000b.The value of execution position is 0b.The value of significance bit is 1b.Overtime position Value is 0b.The second storage element in one ownership queue 101 is located at the lower section of the first storage element.Second storage element has One winding position WB, ownership index, an execution position EXB, a significance bit, a corresponding cache line address CA_34 and an overtime Position STB.Wherein, position WB=1b is wound.The index value of ownership index is 00001b.The value of execution position is 0b.The value of significance bit For 1b.The value of overtime position is 0b.Third storage element in one ownership queue 101 is located at the lower section of the second storage element.The Three storage elements have winding a position WB, ownership index, an execution position EXB, a significance bit, a corresponding cache line address A CA_03 and overtime position STB.Wherein, position WB=0b is wound.The index value of ownership index is 00010b.The value of execution position is 0b.The value of significance bit is 0b.The value of overtime position is 0b.Toward the end of ownership queue 101, last five storage elements have respectively There is cache line address CA_28 to CA_32 to index with corresponding ownership, is respectively provided with ownership index value 11011b -11111b. Storage element with cache line address CA_28 also has execution position, significance bit and overtime position.Execution position, significance bit and overtime The value of position is all 0b.And three storage elements for being next respectively provided with cache line address CA_29-CA_31 are respectively provided with value is The significance bit that the bit of storage and value of 1b is 1b.Storage element with cache line address CA_29 with there is cache line address CA_31 Also having value is the overtime position of 0b.And it is the overtime position of 1b that the storage element with cache line address CA_30, which then has value,.Have The last storage element of cache line address CA_32 is effectively but to have not carried out, and be marked as overtime.
In first time transmitting, cache line address CA_1 to cache line address CA_32 is filled up with the winding position with value 0b Ownership queue 101.When transmitting just beginning second, the storage element of foremost two can be by with cache in transmitting for the first time Line address CA_33 and the storage with cache line address CA_34 and ownership index value 00000b ownership index value 00001b For unit to overriding respectively, it is the winding position WB of 1b that storage element, which respectively has value,.These new storage elements (33 and 34) are effective , but there has been no any microoperations to be performed.Third to the 28th storage element be invalidated (may be complete again without Effect).29th and the 31st storage element is effective, and each is had at least during a microoperation is carrying out.Third Ten storage elements are effective and have at least one microoperation still in commission, but have been marked as overtime.32nd A storage element does not simultaneously have the microoperation being suggested from register alias table 117, therefore the 32nd storage element is still It is not set to execute, but its overtime position has been set to indicate and a storage instruction conflict or hit.
When acquisition unit 107 counts ownership index value to 11111b, such as ownership queue 101 of the value of simultaneously wound position WB In be 0b indicated by the last storage element with cache line address CA_32 (such as transmitting for the first time), it sets winding The value of position is 1b and by ownership rope as having indicated by the storage element of cache line address CA_33 (start second transmit) Draw value and reset to 00000b and restarts to count.The winding position of subsequent 31 storage elements for being subtracted the reading of unit 107 The value of WB is persistently maintained 1b until ownership index is reset as 00000b, and operation is repeated in such as above-mentioned mode.When one When circulation is detected, macro operation is added in the no longer self-demarking code device 109 of round-robin queue 111, and acquisition unit 107 is still constantly from fast Access to memory 105 reads cache line to ownership queue 101 and decoder 109, and corresponding in ownership queue 101, which recycles, to be referred to The storage element enabled may be subtracted the overriding of unit 107.In this case, processor 100 may not be again institute The cache line detecting modification program code stated.By register alias table 117 propose and be located at the microoperation in a circulation and twine Value around position WB is no longer as the value of the winding position for the storage element being written in ownership queue 101.It is micro- what is be suggested The winding place value of operation in the unmatched situation of winding place value of corresponding storage element in ownership queue 101, detect by overriding It surveys device 141 and detects the cache line being written and by microoperation label (or enabling it labeled) at the exceptional cast of the first kind. Even if the storage element in ownership queue 101 is marked as invalid or is released from queue, this is still true.One invalid Or the storage element that is pushed out persistently rest in ownership queue 101 until being written.
Fig. 3 is the flow diagram according to the operation for handling front end 104 in an embodiment.In first block 301, cache Line (such as from system storage 102) is read and is stored in instruction cache memory 105, is e.g. captured in advance by information Engine 103.It is determined in a winding position of next block 303, next cache line with ownership index value, it is e.g. logical Acquisition unit 107 is crossed, and these information are added into next available storage list in ownership queue 101 together with cache line address Member.Acquisition unit 107 also sets the significance bit in the storage element in ownership queue 101.As earlier mentioned, ownership queue 101 are for example implemented to the buffer of a circulation, and the significance bit is to determine in any time point in ownership queue 101 Current effective storage element.In an alternative embodiment, pointer is added can be used with pointer is released.
As shown in next block 305, when a new cache line address is added into ownership queue 101, new cache Line address is compared to the effective destination address of each of storage queue 127.As shown in next inquiry block 307, when When having a hit to be determined, in block 309, the overtime position STB for receiving the storage element of new cache line address is set.? Overtime place value is set or there is no when hit, the operation of ownership queue 101 terminates.
As shown in block 311, meanwhile, it is corresponding when a new cache line address is added into ownership queue 101 Cache line data are added into decoder 109 together with winding position and ownership index.And in next block 313, decoder 109 solves The macro operation in cache line is analysed, and the corresponding winding position of the cache line where macro operation and ownership index are added into each Macro operation.In addition, whether decoder 109 determines macro operation across vertical two cache lines, that is to say, that macro operation originates in one fastly Line taking simultaneously ends at next continuous cache line.If so, macro operation is set across vertical position.At this point, each macro operation tool There are winding place value, ownership index value and across vertical place value.
As shown in block 315, macro operation is then added into round-robin queue 111, and as shown in block 317, is then added into Instruction translator 115.Macro operation is translated into corresponding microoperation.As earlier mentioned, each macro operation is converted into one or more Microoperation.Each microoperation have the winding place value of macro operation being translated, ownership index value with across vertical place value.At this point, every The instruction pointer of one microoperation, which is also designated as, is incorporated in microoperation.It in another example, is the instruction in block 319 or 321 Pointer is incorporated into each microoperation.Any in these frameworks, instruction pointer is added eventually together with each microoperation Enter resequencing buffer 121.In next block 319, microoperation is added into register alias table 117, buffer alias Interdependent information of the table 117 to generate each microoperation according to program sequence, operand and renaming information.In block 321, Register alias table 117 identifies and marks out each microoperation last positioned at a cache line, and an embodiment as the aforementioned is By setting field L to be true.This information is passed to resequencing buffer 121 and is provided to the correspondence of resequencing buffer 121 Storage element, therefore exit module 135 can recognize each cache line instruction it is when processed.Then, microoperation by from It is proposed in register alias table 117 to carry out execution and aftermentioned ownership and exceptional event handling.
Fig. 4 is the flow diagram according to ownership and exceptional event handling in an embodiment.In first block 401, Register alias table 117 proposes each microoperation to resequencing buffer 121 and scheduler 123.Furthermore each micro- behaviour of storage It is also added into storage queue 127.Relevant operation continues to block 402, and what is proposed from register alias table 117 is micro- The ownership of operation is indexed for accessing corresponding storage element in ownership queue 101.This operation is it is stated that in place above-mentioned In the narration for managing multiple function blocks of device 100, but common logic can be concentrated on.When microoperation is by from register alias table When proposing in 117, relevant operation then moves to three different blocks, block 403, block 405 and block 411.
In block 403, the execution position EXB of storage element is set.In addition, if microoperation is also true, institute across vertical position The next continuous storage element having the right in queue 101 is also removed, and the execution position of storage element is also set.At this point, At least cache line that microoperation is removed in ownership queue 101 is marked as in execution, also that is, an at least cache line it is micro- Operation is suggested to be executed.After one or two execution positions are set, this branch in flow chart is completed.
Corresponding winding position WB is obtained and is compared to the winding position WB of microoperation in block 405, storage element.When The winding position WB of microoperation winding position WB corresponding with the storage element in ownership queue 101 is mismatched, such as in next inquiry Block 407, operation are carried out to block 409, and microoperation be marked as the first kind exceptional cast (such as by set T1 as Very).It is judged as matching after label (mismatch) or in winding position WB, the relevant operation of this branch of flow chart terminates.
In block 411, the overtime position STB of the storage element taken out in ownership queue 101 is obtained.In addition, when micro- Operation is very that the overtime position of next continuous storage element of ownership queue 101 is also obtained across vertical place value.In block In 413, judge whether overtime position is set.When one of two overtime positions are set, relevant operation is carried out to block 409, microoperation is marked as the exceptional cast (such as by setting T1 be true) of the first kind.It is to mark micro- behaviour in block 409 Exceptional cast as the first kind is not later or when two overtime positions are all set, the operation knot of this branch of flow chart Beam.
When being ready to be performed as earlier mentioned, it is suggested to each microoperation of scheduler 123 and is eventually scheduled to One of correspondence in multiple execution units 125.It further comprises and dispatches storage microoperation as shown in block 415 to storage Pipeline 129.In next block 417, stores pipeline 129 and determine the destination address of storage microoperation and update storage queue 127 In corresponding storage element.In next block 419, when each new destination address is determined, destination address is compared to institute The effective cache line address having the right in queue 101.In block 421, it is effective fast to judge whether new destination address is matched with Line taking address.When new destination address and any one of the effective cache line address in ownership queue 101 is not matched, phase Operation is closed to complete.
When a new destination address is matched with an effective cache line address, relevant operation is carried out to block 423, often The overtime position of an a matched storage element is set.In addition, the ownership index of matched storage element is transferred to overtime and detects Survey device 145.In next block 425, overtime detector 145 is according to the corresponding storage element of ownership indexed access being provided To obtain the execution position EXB of storage element.In next inquiry block 427, when execution position EXB is decided to be very, correlation is grasped It carries out to block 429, the storage microoperation of conflict is marked as the exceptional cast of Second Type (such as by setting T2 be true). In block 427, when execution position EXB be decided to be vacation or block 429 mark storage microoperation after, operation terminates.
Fig. 5 is according to executing in an embodiment, exit flow diagram with exceptional event handling.In first block In 501, microoperation is scheduled to execution unit 125 from scheduler 123 as earlier mentioned.It is scheduled to be that operation execute but special Determine really not so under operational circumstances.In next block 503, resequencing buffer 121 exit module 135 identify it is next The microoperation to be exited.In next inquiry block 505, the field T1 of microoperation to be retired is determined whether be set to very (such as being determined by exiting module 135).If so, relevant operation is carried out to block 507, the exceptional cast of the first kind is held Row, including refresh process device 100.In addition, causing the microoperation of the exceptional cast of the first kind by as earlier mentioned from instruction cache It is captured again in memory 105.The processing operation of exceptional cast is completed.
Such as next inquiry block 509, when T1 is not that true but T2 is decided to be very (such as via exit module 135), phase Operation is closed to carry out to block 511, the exceptional cast of Second Type is performed at this time, and storage microoperation is allowed to complete and exit, And processor 100 is refreshed.After storage microoperation starts exceptional cast, operation is resumed at instruction cache memory Next instruction in 105.The relevant operation of exceptional event handling is completed in this.In block 513, when T1 and T2 is not Very, microoperation is allowed to exit.In block 514, when the field L of microoperation is set to very, to be designated as operation as cache line The last one microoperation, then block 515 exit module 135 indicate ownership queue 101 so that corresponding storage element without Effectization, and operation is completed.It is invalid that the invalidation, which e.g. passes through label storage element, or releases ownership team Storage element in column 101 stack in storage element.When field L is vacation, after instruction is rejected, operation is completed.
Related content above-mentioned can be made or used the present invention with those of ordinary skill in the art, be associated with as provided The content of specific application and necessary condition.Although the present invention is retouched with reference in certain relevant versions by quite careful mode It states, other versions and variation are feasible and are by thinking over.Multiple variation shapes of the aforementioned embodiment referred to Can be for those of ordinary skills it will be apparent that and general member defined above be then readily applicable to other Embodiment.Such as circuit described herein can be implemented into mode appropriate, such as logic device or similar circuit.
The foregoing is merely present pre-ferred embodiments, the range that however, it is not to limit the invention is any to be familiar with sheet The personnel of item technology can do further improvements and changes without departing from the spirit and scope of the present invention on this basis, because This protection scope of the present invention is when being subject to the range that following claims are defined.

Claims (21)

1. a kind of processor, which is characterized in that for determining memory ownership according to cache line to detect to have and be overlapped in fastly The modification program code of the procedure code of the instruction of line taking range, the processor include:
Ownership queue, including multiple storage elements;
Acquisition system, the cache line data to provide a plurality of cache line give processing front end, wherein it is directed to each cache line, The ownership is indexed and is inputted with corresponding cache line address into the ownership to determine that ownership indexes by the acquisition system One of multiple storage element of queue;
Wherein, the processing front end is to be translated into multiple instruction for the cache line data of a plurality of cache line, the wherein processing Front end to set be originated from be overlapped in two cache lines cache line data each instruction across vertical position, and the processing front end to Each instruction in multiple instruction is issued to execute, and the ownership of corresponding storage element in the ownership queue is indexed The each instruction being issued is added;
Overtime detecting system, to set any in multiple storage element in the ownership queue with storage instruction conflict The overtime position of storage element, and when the overtime position of corresponding storage element is set, or when the instruction being issued is across vertical When position is all set with the overtime position of next continuous storage element of the corresponding storage element in the ownership queue, The overtime detecting system marks the instruction being issued with the first exceptional cast;And
Execution system, when labeled the first exceptional cast with calling of the instruction that will be exited, the execution system is to execute this First exceptional cast.
2. processor according to claim 1, which is characterized in that first exceptional cast makes the execution system refresh at this Device is managed, avoids the instruction for generating first exceptional cast from exiting, and pick the acquisition system again from instruction cache memory Take the instruction for generating first exceptional cast.
3. processor according to claim 1, which is characterized in that the execution system is also to determine each storage being issued Deposit the destination address of instruction;
Wherein the overtime detecting system includes:
Comparator, to be carried out to the cache line address and each destination address being determined that are input into the ownership queue Compare, when matching result is found, overtime position of the comparator to set the storage element being entered;And
Overtime detector reads corresponding storage in the ownership queue to use the ownership for the instruction being issued to index The overtime position of unit, and when being contained in when being set across vertical position of the instruction being issued, use the institute for the instruction being issued It has the right to index the overtime position for reading next continuous storage element in the ownership queue after the corresponding storage element, and When the overtime position of the corresponding storage element in the ownership queue or the overtime position quilt of next continuous storage element When setting, the overtime detector is to keep the instruction being issued labeled to call first exceptional cast.
4. processor according to claim 1, which is characterized in that
The processing front end uses the corresponding storage list in the ownership indexed access of the instruction ownership queue being issued For member to set the execution position of the corresponding storage element, and when the instruction being issued across vertical position when should be set, setting should The execution position of next continuous storage element after corresponding storage element;
Destination address of the execution system to each storage instruction for determining to be issued;
Wherein, which includes:
Comparator, when each destination address by the execution system determine when, the comparator to more each destination address with Each cache line address of the effective storage element stored in the ownership queue, and to the setting when matching result is found The overtime position of each matched storage element;And
Overtime detector, the execution position of each matched storage element to calculate comparator decision, and when any When any execution position of matched storage element is set, which makes the storage for corresponding to the destination address being determined It is labeled to call the second exceptional cast to deposit instruction;
Wherein, when the storage instruction that will be exited is labeled to call second exceptional cast, the execution system is to execute Second exceptional cast, second exceptional cast make the execution system allow the labeled storage to call second exceptional cast Instruction is exited, and the processor is refreshed, and the acquisition system is made to obtain instruction pointer to capture the storage from instruction cache memory Instruction after instruction.
5. processor according to claim 1, which is characterized in that also include:
Destination address of the execution system to each storage instruction for determining to be issued;
Wherein, which includes:
First comparator, to by be input into the ownership queue each cache line address and each target being determined Address is compared, when matching result is found, the overtime of the first comparator to set the storage element being entered Position;And
Second comparator, when each destination address is determined by the execution system, second comparator is to by each target Each cache line address of the effective storage element stored in address and the ownership queue is compared, and second comparator The overtime position also to set each matched storage element when matching result is found.
6. processor according to claim 5, which is characterized in that the processor uses the ownership for the instruction being issued Corresponding storage element in the indexed access ownership queue, to set the execution position of the corresponding storage element;
Wherein, when next company when should be set across vertical position, after setting the corresponding storage element of the instruction being issued The execution position of continuous storage element.
7. processor according to claim 6, which is characterized in that the overtime detecting system also includes:
Overtime detector to calculate the execution position of each matched storage element determined by second comparator, and is worked as When the execution position of any matched storage element is set, the overtime detector is also to make with corresponding to the target being determined The storage instruction of location is labeled to call the second pending exceptional cast;And
Wherein, when the storage instruction that will be exited it is labeled to call second exceptional cast when, the execution system execute this Two exceptional casts.
8. processor according to claim 7, which is characterized in that second exceptional cast marks execution system permission Note is exited with calling the storage of second exceptional cast to instruct, and refreshes the processor, and refers to acquisition system acquirement Needle is to capture the instruction after storage instruction from instruction cache memory.
9. processor according to claim 5, which is characterized in that the execution system also includes:
Queue is stored, includes multiple storage elements, wherein each storage element is to store the storage issued from the processing front end Instruction is deposited, and to store destination address;And
Pipeline is stored, the destination address instructed to determine scheduled each storage with execution, and by each mesh being determined Mark address is provided to corresponding storage element in the storage queue and is provided to second comparator.
10. processor according to claim 1, which is characterized in that
When inputting cache line address, which keeps the corresponding storage element in the ownership queue effective;
Wherein, the processing front end by the corresponding storage element in multiple storage element in the ownership queue most Cue mark is final injunction afterwards;Also,
When the instruction exited is marked as the final injunction, which makes multiple storage list in the ownership queue The corresponding storage element of this in member is invalid.
11. processor according to claim 1, which is characterized in that
For the acquisition system to determine that ownership index is binary count value, the binary count value is single with each storage Member is input into the ownership queue and increases, and the total quantity of the binary count value is at least the storage in the ownership queue The total quantity of memory cell;
Wherein, the most significant bit of ownership index includes winding position;
The processor also includes overriding detector, which reads the institute using the ownership index for the instruction being issued The winding position in corresponding storage element having the right in queue, and be issued when the winding position of the corresponding storage element mismatches The instruction winding position when, which keeps the instruction being issued labeled to call first exceptional cast.
12. a kind of determine memory ownership according to cache line to detect the procedure code with the instruction for being overlapped in cache line range Modification program code method characterized by comprising
A plurality of cache line is captured, and determines the ownership index of each cache line of a plurality of cache line, wherein each cache Line includes cache line address and cache line data;
Each cache line is indexed to one of the multiple storage elements being pushed into ownership queue with corresponding ownership;
Cache line data of the translation from a plurality of cache line are multiple instruction;
Setting be originated from be overlapped in two cache lines cache line data each instruction across vertical position;
Each instruction of multiple instruction is issued to execute, and the ownership of storage element corresponding in ownership queue is indexed With the corresponding each instruction for being added and being issued across vertical position;
Set the overtime position in multiple storage element in the ownership queue with any storage element of storage instruction conflict;
When the overtime position of the corresponding storage element is set, or when the instruction being issued is across vertical position and the ownership queue In the overtime position of next continuous storage element of corresponding storage element when being all set, with the first exceptional cast label The instruction being issued;And
When the instruction that will be exited is labeled to call first exceptional cast, the first exceptional cast is executed.
13. according to the method for claim 12, which is characterized in that the step of executing first exceptional cast includes:
The instruction for calling first exceptional cast is avoided to exit;
Refresh process device;And
Again the instruction for calling first exceptional cast is captured.
14. according to the method for claim 12, which is characterized in that also include:
Determine the destination address of each storage being issued instruction;
Compare the cache line address for being input into the ownership queue and each destination address being determined, when matching result quilt It was found that when, set the overtime position for the storage element being entered;
The overtime position of corresponding storage element in the ownership queue is read using the ownership index for the instruction being issued, and is worked as When being set across vertical position of the instruction being issued, also being read using the ownership index for the instruction being issued should in ownership queue The overtime position of next continuous storage element after corresponding storage element;And
When the overtime position of the corresponding storage element in the ownership queue or the overtime of next continuous storage element When position is set, the instruction being issued is marked to call the first exceptional cast.
15. according to the method for claim 12, which is characterized in that also include:
Using corresponding storage element in the ownership queue of the ownership indexed access for the instruction being issued and to set this right The execution position for the storage element answered, and when the instruction being issued across vertical position when being set, after setting the corresponding storage element Next continuous storage element execution position;
Determine the destination address of each storage being issued instruction;
When each destination address is determined, the effective storage stored in more each destination address and the ownership queue is single Each cache line address of member, and set when matching result is found the overtime position of each matched storage element;
When the execution position of any matched storage element is set, label corresponds to the storage instruction for the destination address being determined To call the second exceptional cast;And
When the storage instruction that will be exited is labeled to call second exceptional cast, second exceptional cast is executed, wherein The step of executing second exceptional cast includes:
Allow labeled to call the storage instruction of second exceptional cast to exit;
Refresh process device;And
Instruction pointer is obtained to capture the instruction after storage instruction from instruction cache memory.
16. according to the method for claim 12, which is characterized in that also include:
Determine the destination address of each storage being issued instruction;
Compare each cache line address for being input into the ownership queue and each destination address being determined, and when matching When being as a result found, the overtime position for the storage element being entered is set;And
When each destination address, which is performed system, to be determined, more each destination address and it is stored in the ownership queue Each cache line address of effective storage element, and set when matching result is found the overtime of each matched storage element Position.
17. according to the method for claim 16, which is characterized in that also include:
Using corresponding storage element in the ownership queue of the ownership indexed access for the instruction being issued, and set the correspondence Storage element execution position, and when the instruction being issued across vertical position when being set, after setting the corresponding storage element The execution position of next continuous storage element.
18. according to the method for claim 17, which is characterized in that also include:
When the execution position of any matched storage element is set, label corresponds to the storage instruction for the destination address being determined To call the second pending exceptional cast;And
When the storage instruction that will be exited is labeled to call second exceptional cast, second exceptional cast is executed.
19. according to the method for claim 18, which is characterized in that the step of executing second exceptional cast also includes:
Allow labeled to call the storage instruction of second exceptional cast to exit;
Refresh process device;And
Instruction pointer is obtained to capture the instruction after storage instruction from instruction cache memory.
20. according to the method for claim 12, which is characterized in that also include:
When inputting cache line address, keep corresponding storage element in the ownership queue effective;
It is last for marking the final injunction of the corresponding storage element in multiple storage element in the ownership queue Instruction;And
When the instruction exited is marked as final injunction, make in multiple storage element of the ownership queue this is corresponding One storage element is invalid.
21. according to the method for claim 12, which is characterized in that also include:
The step of determining ownership index includes to determine that ownership index is binary count value, the binary count value Increase as each storage element is input into the ownership queue, the total quantity of the binary count value is at least the institute The total quantity for the storage element having the right in queue, wherein the most significant bit of ownership index includes winding position;
The winding position of the corresponding storage element in the ownership queue is read using the ownership index for the instruction being issued;With And
When the winding position of the corresponding storage element mismatches the winding position for the instruction being issued, mark the instruction that is issued with Call first exceptional cast.
CN201710137889.1A 2016-04-20 2017-03-09 Detect the processor and method of modification program code Active CN106933537B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201662324945P 2016-04-20 2016-04-20
US62/324,945 2016-04-20
US15/156,403 2016-05-17
US15/156,403 US9792216B1 (en) 2016-04-20 2016-05-17 System and method of determining memory ownership on cache line basis for detecting self-modifying code including code with instruction that overlaps cache line boundaries

Publications (2)

Publication Number Publication Date
CN106933537A CN106933537A (en) 2017-07-07
CN106933537B true CN106933537B (en) 2019-03-08

Family

ID=59433835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710137889.1A Active CN106933537B (en) 2016-04-20 2017-03-09 Detect the processor and method of modification program code

Country Status (1)

Country Link
CN (1) CN106933537B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009516A (en) * 1996-10-21 1999-12-28 Texas Instruments Incorporated Pipelined microprocessor with efficient self-modifying code detection and handling
CN1558325A (en) * 2004-02-03 2004-12-29 智慧第一公司 Device and method for invalidating redundant items in branch target address cache

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8656121B2 (en) * 2011-05-17 2014-02-18 International Business Machines Corporation Facilitating data coherency using in-memory tag bits and tag test instructions
JP2017516228A (en) * 2014-05-12 2017-06-15 インテル・コーポレーション Method and apparatus for providing hardware support for self-modifying code

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009516A (en) * 1996-10-21 1999-12-28 Texas Instruments Incorporated Pipelined microprocessor with efficient self-modifying code detection and handling
CN1558325A (en) * 2004-02-03 2004-12-29 智慧第一公司 Device and method for invalidating redundant items in branch target address cache

Also Published As

Publication number Publication date
CN106933537A (en) 2017-07-07

Similar Documents

Publication Publication Date Title
KR100341431B1 (en) Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions
US4763245A (en) Branch prediction mechanism in which a branch history table is updated using an operand sensitive branch table
US7962730B2 (en) Replaying memory operation assigned a load/store buffer entry occupied by store operation processed beyond exception reporting stage and retired from scheduler
US6883086B2 (en) Repair of mis-predicted load values
US8190825B2 (en) Arithmetic processing apparatus and method of controlling the same
US7721066B2 (en) Efficient encoding for detecting load dependency on store with misalignment
US7958336B2 (en) System and method for reservation station load dependency matrix
TW201401045A (en) Controlling operation of a run-time instrumentation facility from a lesser-privileged state
US11513801B2 (en) Controlling accesses to a branch prediction unit for sequences of fetch groups
US20120290780A1 (en) Multithreaded Operation of A Microprocessor Cache
CN104978284A (en) Processor subroutine cache
CN100524202C (en) Data processing system, processor and method of data processing employing an improved instruction destination tag
US9053035B1 (en) Multi-threaded system for performing atomic binary translations
CN106557304A (en) For predicting the Fetch unit of the target of subroutine return instruction
CN106933537B (en) Detect the processor and method of modification program code
CN106933538B (en) Detect the processor and method of modification program code
CN106919367B (en) Detect the processor and method of modification program code
CN106933539B (en) Detect the processor and method of modification program code
CN104516829A (en) Microprocessor and method for using an instruction loop cache thereof
CN110515659B (en) Atomic instruction execution method and device
TWI242744B (en) Apparatus, pipeline microprocessor and method for avoiding deadlock condition and storage media with a program for avoiding deadlock condition
US10430342B2 (en) Optimizing thread selection at fetch, select, and commit stages of processor core pipeline
US10366049B2 (en) Processor and method of controlling the same
US20160283230A1 (en) Arithmetic processing device and method for controlling arithmetic processing device
TWI283827B (en) Apparatus and method for efficiently updating branch target address cache

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 301, 2537 Jinke Road, Zhangjiang High Tech Park, Pudong New Area, Shanghai 201203

Patentee after: Shanghai Zhaoxin Semiconductor Co.,Ltd.

Address before: Room 301, 2537 Jinke Road, Zhangjiang hi tech park, Pudong New Area, Shanghai 201203

Patentee before: VIA ALLIANCE SEMICONDUCTOR Co.,Ltd.

CP03 Change of name, title or address