CN107729055A - Microprocessor and its execution method - Google Patents

Microprocessor and its execution method Download PDF

Info

Publication number
CN107729055A
CN107729055A CN201710978680.8A CN201710978680A CN107729055A CN 107729055 A CN107729055 A CN 107729055A CN 201710978680 A CN201710978680 A CN 201710978680A CN 107729055 A CN107729055 A CN 107729055A
Authority
CN
China
Prior art keywords
mentioned
core
process cores
square
cores
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710978680.8A
Other languages
Chinese (zh)
Other versions
CN107729055B (en
Inventor
G·葛兰·亨利
史蒂芬·嘉斯金斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/281,796 external-priority patent/US9575541B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN107729055A publication Critical patent/CN107729055A/en
Application granted granted Critical
Publication of CN107729055B publication Critical patent/CN107729055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30079Pipeline control instructions, e.g. multicycle NOP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3237Power saving characterised by the action undertaken by disabling clock generation or distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3296Power saving characterised by the action undertaken by lowering the supply or operating voltage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/24Handling requests for interconnection or transfer for access to input/output bus using interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Microcomputers (AREA)
  • Power Sources (AREA)

Abstract

The present invention provides a kind of microprocessor and its performs method.Microprocessor includes multiple process cores, and each of which process cores instantiate visible storage resources on a respective framework.One first process cores of above-mentioned multiple process cores run into framework instruction, and it indicates that above-mentioned first process cores update above-mentioned each visible storage resources on framework of above-mentioned first process cores using by a value of above-mentioned framework instruction.Above-mentioned framework instruction is run into for response, above-mentioned numerical value is supplied to each process cores of above-mentioned multiple process cores and above-mentioned each visible storage resources on framework of above-mentioned first process cores are updated using above-mentioned value by above-mentioned first process cores.The above-mentioned value that each process cores in addition to above-mentioned first process cores are provided in the case where not running into above-mentioned framework instruction using above-mentioned first process cores, update above-mentioned each visible storage resources on framework of above-mentioned first process cores.

Description

Microprocessor and its execution method
The application is the applying date for August in 2014 28 days, Application No. 201410431675.1, entitled " micro- place The divisional application of the application of reason device and its execution method ".
Technical field
The present invention is related to a microprocessor, and visually stores renewal particularly with regard to the instant framework of each core and propagate.
Background technology
The increase of multi-core microprocessor, it is primarily due to it and provides the advantage in performance.May be mainly due to half Conductor device geometry dimension size is rapidly reduced, so as to add transistor density.The presence of multinuclear in a microprocessor The demand to be communicated with a core with other cores is produced, to complete various functions, such as power management, cache memory pipe Reason, the configuration closed except mistake and with more nuclear phases.
Traditionally, the program (for example, operating system or application program) for operating in framework on polycaryon processor uses position Semaphore on by all core frameworks in an addressable system storage is communicated.This may be sufficiently used for many mesh , but possibly can not provide other required speed, the degree of accuracy and/or systemic hierarchial transparency.
The content of the invention
The present invention provides a kind of microprocessor.Above-mentioned microprocessor includes multiple process cores, wherein above-mentioned multiple process cores Each processing examine one respective framework of exampleization on visible storage resources.One first process cores of above-mentioned multiple process cores are configured To run into framework instruction, it indicates above-mentioned first process cores renewal above-mentioned the using by a value of above-mentioned framework instruction Above-mentioned each visible storage resources on framework of one process cores.As the response for running into above-mentioned framework instruction, above-mentioned first processing Core is additionally configured to above-mentioned numerical value being supplied to each process cores of above-mentioned multiple process cores and using above-mentioned value renewal above-mentioned the Above-mentioned each visible storage resources on framework of one process cores.Above-mentioned multiple process cores in addition to above-mentioned first process cores it is every One process cores are configured as the above-mentioned value provided in the case where not running into above-mentioned framework instruction using above-mentioned first process cores, Update above-mentioned each visible storage resources on framework of above-mentioned first process cores.
The present invention has multiple process cores by a kind of performed method in the microprocessor, above-mentioned microprocessor, wherein Visible storage resources on one respective framework of exampleization are examined in each processing of above-mentioned multiple process cores.The above method includes:By above-mentioned One first process cores of multiple process cores run into framework instruction, and it is used in the value instruction by above-mentioned framework instruction State above-mentioned each visible storage resources on framework that the first process cores update above-mentioned first process cores;Will by above-mentioned first process cores Above-mentioned numerical value is supplied to each process cores of above-mentioned multiple process cores, to be used as the response for running into above-mentioned framework instruction;By above-mentioned First process cores update above-mentioned each visible storage resources on framework of above-mentioned first process cores using above-mentioned value, using as running into The response of above-mentioned framework instruction;And existed by each process cores of above-mentioned multiple process cores in addition to above-mentioned first process cores The above-mentioned value provided in the case of not running into above-mentioned framework instruction using above-mentioned first process cores, updates above-mentioned first process cores Above-mentioned each visible storage resources on framework.
The present invention provides a kind of in for a computer installation coded by an at least non-transient computer usable medium Computer program product, above computer program product include the computer usable program code of one microprocessor of instruction.Above-mentioned meter Calculation machine usable program code is included to indicate the first procedure code of multiple process cores, wherein each processing of above-mentioned multiple process cores Core instantiates visible storage resources on a respective framework.One first process cores of above-mentioned multiple process cores are configured as running into a frame Structure instructs, and it indicates that above-mentioned first process cores update above-mentioned first process cores using by a value of above-mentioned framework instruction Above-mentioned each visible storage resources on framework.As the response for running into above-mentioned framework instruction, above-mentioned first process cores are also configured For above-mentioned numerical value to be supplied to each process cores of above-mentioned multiple process cores, and above-mentioned first process cores are updated using above-mentioned value Above-mentioned each visible storage resources on framework.Each process cores quilt of above-mentioned multiple process cores in addition to above-mentioned first process cores It is configured to the above-mentioned value provided in the case where not running into the instruction of above-mentioned framework using above-mentioned first process cores, renewal above-mentioned the Above-mentioned each visible storage resources on framework of one process cores.
The present invention provides a kind of microprocessor, including:Multiple process cores, wherein each process cores of above-mentioned multiple process cores Instantiate visible storage resources on a respective framework;Wherein, one first process cores of above-mentioned multiple process cores are configured as:Run into One framework instructs, wherein the instruction of above-mentioned framework indicates above-mentioned first process cores more using by a value of above-mentioned framework instruction Above-mentioned each visible storage resources on framework of new above-mentioned first process cores;And as the response for running into the instruction of above-mentioned framework, Fresh information is sent to each other process cores of above-mentioned multiple process cores, above-mentioned value is supplied to above-mentioned multiple process cores Each other process cores, and visible storage resources on the above-mentioned respective framework of above-mentioned first process cores are updated using above-mentioned value, Wherein, each process cores of above-mentioned multiple process cores in addition to above-mentioned first process cores are configured as not running into above-mentioned framework The above-mentioned value provided in the case of instruction using above-mentioned first process cores, updating the above-mentioned of above-mentioned process cores each can on framework Storage resources are seen, using the response as above-mentioned fresh information.
Also a kind of performed method in the microprocessor of the present invention, above-mentioned microprocessor have multiple process cores, wherein Visible storage resources on one respective framework of exampleization are examined in each processing of above-mentioned multiple process cores, and the above method includes:By above-mentioned One first process cores of multiple process cores run into framework instruction, wherein above-mentioned framework instructs use to instruct meaning by above-mentioned framework A fixed value indicates that above-mentioned first process cores update above-mentioned each visible storage resources on framework of above-mentioned first process cores;Will more Fresh information is sent to each other process cores of above-mentioned multiple process cores from above-mentioned first process cores, using as running into above-mentioned framework The response of instruction;There is provided each other process cores of above-mentioned value to above-mentioned multiple process cores by above-mentioned first process cores, using as Run into the response of above-mentioned framework instruction;By above-mentioned first process cores the above-mentioned respective of above-mentioned first process cores is updated using above-mentioned value Visible storage resources on framework, to be used as the response for running into above-mentioned framework instruction;And by addition to above-mentioned first process cores Each process cores of above-mentioned multiple process cores are carried in the case where not running into above-mentioned framework instruction using above-mentioned first process cores The above-mentioned value supplied, above-mentioned each visible storage resources on framework of above-mentioned process cores are updated, using the sound as above-mentioned fresh information Should.
The present invention has less power consumption.
Brief description of the drawings
Fig. 1 is the block diagram for showing a multi-core microprocessor.
Fig. 2 is the block diagram for showing a control word, a status word and a configuration words.
Fig. 3 is the flow chart for showing control unit operation.
Fig. 4 is a block diagram of the microprocessor for showing another embodiment.
Fig. 5 is to show a microprocessor operation with the flow chart of dump Debugging message.
Fig. 6 is to show an operation example timing diagram according to microprocessor in Fig. 5 flow charts.
Fig. 7 A~7B are to show that a microprocessor performs the flow chart of across core speed buffering control operation.
Fig. 8 is the timing diagram for showing the microprocessor operation example according to Fig. 7 A~7B flow charts.
Fig. 9 is to show that microprocessor enters the operational flowchart of low-power encapsulation C- states.
Figure 10 is to show the timing diagram according to the microprocessor operation example of Fig. 9 flow charts one.
Figure 11 is the operating process that microprocessor according to another embodiment of the present invention enters low-power encapsulation C- states Figure.
Figure 12 is the timing diagram for showing the example of microprocessor operation one according to Figure 11 flow charts.
Figure 13 is the timing diagram for showing another example of microprocessor operation according to Figure 11 flow charts.
Figure 14 is to show the flow chart that the dynamic of microprocessor reconfigures.
Figure 15 is to show the flow chart according to microprocessor dynamic reconfigures in another embodiment.
Figure 16 is the timing diagram for showing the example of microprocessor operation one according to Figure 15 flow charts.
Figure 17 is shown in a block diagram of hardware semaphore 118 in Fig. 1.
Figure 18 is shown when a core 102 reads the operational flowchart of hardware semaphore 118.
Figure 19 is shown when the operational flowchart of core write-in hardware semaphore.
Figure 20 be show when microprocessor using hardware semaphore with perform need a resource exclusive ownership operating process Figure.
Figure 21 is to show the timing diagram that one example of non-sleep synchronization request operation is sent according to the core of Fig. 3 flow charts.
Figure 22 is the program flow diagram for showing configuration microprocessor.
Figure 23 is to show the program flow diagram according to microprocessor is configured in another embodiment.
Figure 24 is the block diagram for showing the multi-core microprocessor according to another embodiment.
Figure 25 is the block diagram for showing a microcode patching framework.
Figure 26 A~26B are to show in Figure 24 the microprocessor to propagate a Figure 25 microcode patching to the microprocessor One operational flowchart of multinuclear.
Figure 27 is the timing diagram for the example for showing the microprocessor operation according to Figure 26 A~26B flow charts.
Figure 28 is the block diagram for showing the multi-core microprocessor according to another embodiment.
Figure 29 A~29B are to show in Figure 28 according to another embodiment the microprocessor to propagate a microcode patching extremely One operational flowchart of multiple cores of the microprocessor.
Figure 30 is to show Figure 24 microprocessor to repair the flow chart of a service processor procedure code.
Figure 31 is the block diagram for showing the multi-core microprocessor according to another embodiment.
Figure 32 is show the microprocessor in Figure 31 to propagate that a MTRR is updated to multiple cores of the microprocessor one Operational flowchart.
Wherein, symbol is simply described as follows in accompanying drawing:
100:Multi-core microprocessor;102A、102B、102N:Core A, core B, core N;103:Non-core;104:Control unit; 106:State buffer;108A、108B、108C、108D、108N:Synchronous buffer;108E、108F、108G、108H:Shadow is same Walk buffer;114:Fuse;116:Special random access memory;118:Hardware semaphore;119:Shared speed buffering is deposited Reservoir;122A、122B、122N:Clock signal;124A、124B、124N:Interrupt signal;126A、126B、126N:Data-signal; 128A、128B、128N:Controlling electric energy signal;202:Control word;204:Wake events;206:Synchronization Control;208:Power supply lock; 212:Sleep;214:Selective wake-up;222:S;224:C;226:Synchronous regime or C- states;228:Core set;232:Force It is synchronous;234:Selectivity is synchronous to be stopped;236:Disable core;242:Status word;244:Wake events;246:Minimum conventional C- shapes State;248:Error code;252:Configuration words;254-0~254-7:Enable;256:Local nuclear volume;258:Amount of crystals;302、 304、305、306、312、314、316、318、322、326、328、332、334、336:Step;402A、402B:Bus between crystal Bus unit B between unit A, crystal;404:Bus between crystal;406A、406B:Crystal A, crystal B;502、504、505、508、 514、516、518、524、526、528、532:Step;702、704、706、708、714、716、717、718、724、726、727、 728、744、746、747、748、749、752:Step;902、904、906、907、908、909、914、916、919、921、924: Step;1102、1104、1106、1108、1109、1121、1124、1132、1134、1136、1137:Step;1402、1404、 1406、1408、1412、1414、1416、1417、1418、1422、1424、1426:Step;1502、1504、1506、1508、 1517、1518、1522、1524、1526、1532:Step;1702:Possess position;1704:Owner position;1706:State machine 1802, 1804、1806、1808:Step;1902、1904、1906、1908、1912、1914、1916、1918:Step;2002、2004、 2006、2008:Step;2202、2203、2204、2205、2206、2208、2212、2214、2216、2218、2222、2224:Step Suddenly;2302、2304、2305、2306、2312、2315、2318、2324:Step;2404:Core microcode read-only storage;2408:It is non- Core microcode patching random access memory;2423:Service unit;2425:Non-core microcode read-only storage;2439:Repairing Can addressing content memorizer;2497:Service unit initial address buffer 2499:Core random access memory;2500:It is micro- Code repairing;2502:Header;2504:Immediately repairing;2506:Check and correction and;2508:CAM data;2512:Core PRAM is repaired;2514: Check and correction and;2516:RAM is repaired;2518:Non-core PRAM is repaired;2522:Check and correction and;2602、2604、2606、2608、2611、 2612、2614、2616、2618、2621、2622、2624、2626、2628、2631、2632、2634、2652:Step;2808:Core Repair RAM;2912、2916、2922、2932:Step;3002、3004、3006:Step;3102:Type of memory scope is kept in Device;3202、3204、3206、3208、3211、3212、3214、3216、3218、3252:Step.
Embodiment
Hereinafter introduce highly preferred embodiment of the present invention.Each embodiment is not used to limit to illustrate the principle of the present invention The system present invention.The scope of the present invention is worked as to be defined by claims.
Fig. 1 is refer to, it is the block diagram for showing a multi-core microprocessor 100.Microprocessor 100 includes multiple processing Core, 102A, 102B is denoted as to 102N, it is referred to as multiple process cores 102, or referred to as multiple cores 102, and is individually referred to as locating Manage core 102 or abbreviation core 102.More preferably to say, each core 102 includes the pipeline (not shown) of one or more functional units, its Including an instruction cache (instruction cache), an instruction converting unit or instruction decoder, more preferably Deposited including a microcode (microcode) unit, temporary call by name unit, reservation station (Reservation station), speed buffering Reservoir, execution unit, memory sub-system and the retirement unit (retire unit) including an order buffer.More preferably say, Multiple cores 102 include a SuperScale (Superscalar), non-sequential execution (out-of-order execution) microbody frame Structure.In one embodiment, microprocessor 100 is an x86 architecture microprocessors, but in other embodiments, microprocessor 100 accords with Close the framework of other instruction set.
Microprocessor 100 also includes a non-core 103 different from above-mentioned multiple cores 102 for being coupled to above-mentioned multiple cores 102. Non-core 103 includes a control unit 104, fuse 114, special (the Private Random of random access memory 116 Access Memory, PRAM) and a shared cache memory 119 (Shared Cache Memory), for example, by more The second level (level-2, L2) and/or the third level (level-3, L3) cache memory that individual core 102 is shared.It is each Core 102 is configured to read data/write data to non-core 103 from non-core 103 by a respective address/data bus 126, Core 102 provides a nand architecture address space (being also considered as special or micro-architecture address space) to the shared resource of non-core 103.It is special Random access memory 116 is special or nand architecture, that is to say, that it is not in framework user's program of microprocessor 100 In the space of location.In one embodiment, non-core 103 includes arbitrated logic (Arbitration Logic), and it passes through multiple cores 102 Requests for arbitration accesses the resource of non-core 103.
Each fuse 114 is an electronic installation, and it can be blown or not be blown;When fuse 114 is not blown, Fuse 114 has Low ESR and easily conducts electric current;When fuse 114 is blown, fuse 114 has high impedance and not allowed Easily conduction electric current.One detection circuit is associated with each fuse 114, to assess the fuse 114, for example, detecting the fusing Device 114 whether conduct a high current or low-voltage (not blowing, for example, logic is zero or removes (clear)) or a low current or High voltage (blows, for example, logic is one or sets (set)).The fuse 114 can be interior during the manufacture of microprocessor 100 It is blown, and in certain embodiments, a fuse 114 not blown can be blown after the manufacture of microprocessor 100.More preferably Say, a fuse 114 blown is irreversible.The example of one fuse 114 is a polysilicon fuse, and it can be applied between device Add a sufficiently high voltage and blow.Another example of one fuse 114 is nickel-chromium fuse, and it can be used a laser and blows. More preferably say, sensing circuit electric power opens sensing fuse 114, and provides it and assess to the preservation buffer of microprocessor 100 A corresponding positions in (Holding Register).When microprocessor 100 is reset releasing, multiple cores 102 (for example, microcode) Reading and saving buffer is to determine the value of sensed fuse 114.In one embodiment, it is reset solution in microprocessor 100 Before removing, updated value can input scanning to preservation buffer via a boundary scan, for example, seem a combined testing Inputted for tissue (Joint Test Action Group, JTAG), the value of fuse 114 is updated with essence.This is used to test And/or wrong purpose is detectd, such as described in lower section particularly useful in the embodiment related to Figure 22 and Figure 23.
In addition, in one embodiment, microprocessor 100 includes different local advanced programmable related from each core 102 Interrupt control unit (Advanced Programmable Interrupt Controller, APIC) (not shown).It is real one Apply in example, observe California (California) holy santa clara local Advanced Programmable Interrupt Controllers APICs framework The Intel Company of (Santa Clara) is one in May, 2012 Intel 64 and IA-32 Framework Software developer's handbooks 3A The explanation of local Advanced Programmable Interrupt Controllers APICs, particularly in Section 10.4.Especially local advanced programmable interrupt control Device processed, which includes an Advanced Programmable Interrupt Controllers APICs ID and one, includes pilot processor (Bootstrap Processor, BSP) flag Target Advanced Programmable Interrupt Controllers APICs plot buffer, its produce and purposes will be described in further detail it is as follows, especially with Embodiment relevant Figure 14 to Figure 16.
Control unit 104 includes the combination of hardware, software or hardware and software.Control unit 104 includes a hardware signal Measure (Hardware Semaphore) 118 (describing below figure 17 in detail to Figure 20), a state buffer 106, one configuration temporarily Storage 112 and with each core 102 it is each it is self-corresponding together walk buffer 108.More preferably say, the entity of each non-core 103 is non- Can be addressed in framework address space in different address by each core 102, its nand architecture address space can make microcode read and Write core 102.
Each synchronous buffer 108 can be write by each self-corresponding core 102.State buffer 106 is read by each core 102 Take.Configuration buffer 112 can be read by each core 102 and be write indirectly (via Fig. 2 as described below deactivation core position 236). Control unit 104 may also include interrupt logic (not shown), and the interrupt logic is generated to the corresponding interruption letter of each core 102 Number (interrupt signal, INTR) 124, interrupt signal core 102 as corresponding to producing with interruption control unit 104.In Disconnected source responds the control unit 104 and produced to an interrupt signal 124 of a core 102, and interrupt source may include exterior interrupt (example As x86 frameworks INTR, SMI, NMI interrupt source) or bus events (for example, x86 framework formulas bus signals STPCLK establish (assertion) or (de-assertion) is established in releasing).In addition, each core 102 can be transmitted by write control unit 104 One internuclear interrupt signal 124 is to other each cores 102.More preferably say, unless otherwise stated, described herein internuclear Interrupt signal asks the internuclear interrupt signal of nand architecture for the microcode of a core 102 via a microcommand (microinrstuction), It is different from instructing the asked internuclear interrupt signal of conventional architectures via a framework by system software.Finally, when a synchronous feelings When condition (Synchronization Condition) has occurred and that, as described below (for example, referring to the side in Figure 21 and Fig. 3 Block 334), control unit 104 can produce an interrupt signal 124 to core 102 (a synchronous interrupt signal).Control unit 104 is also produced To each core 102, wherein control unit 104 can selectively close off clock signal (CLOCK) 122 corresponding to raw one, and have Effect ground makes corresponding core 102 enter sleep and opened to wake up core 102 to back up.Control unit 104 also produces a corresponding core For controlling electric energy signal (PWR) 128 to each core 102, it optionally controls corresponding core 102 to receive or do not receive electric energy.Cause This, control unit 104 can optionally make a core 102 enter a deeper sleep shape via corresponding controlling electric energy signal 128 State reopens electric energy to the core 102 to wake up the core 102 to close the electric energy of the core.
One core, 102 writable synchronous buffer its corresponding, that there is sync bit set (the S positions 222 for referring to Fig. 2) In 108, aforesaid operations are considered as a synchronization request (Synchronization Request).More detailed description is described as follows, In one embodiment, synchronization request request control unit 104 makes core 102 enter sleep state, and when a synchronous situation occurs When and/or wake up the core 102 when specific wake events occur.One synchronous situation occurs to own in microprocessor 100 The core 102 that can enable and (refer to the enable position 254 in Fig. 2) can enable a particular subset of core 102 and close and (refer in Fig. 2 Core set field 228) have been written into identical synchronous situation and (be described in more detail in C positions 224 in Fig. 2, synchronous situation or C- status bars One combination of position 226 and core set field 228, S positions 222 are more fully described as follows) to its corresponding synchronous buffer 108 When.In response to one it is synchronous the occurrence of, control unit 104 wakes up and is just waiting all cores 102 of the synchronous situation simultaneously, That is, synchronous situation is had requested that.In another embodiment being described as follows, core 102 can ask to be only the last written the synchronization request A core 102 be waken up (the selective wake-up position 214 for referring to Fig. 2).In another embodiment, synchronization request does not ask core 102 enter sleep state, on the contrary, synchronization request request control unit 104 interrupts core 102 when synchronous situation occurs, more in detail Carefully it is described as follows, particularly Fig. 3 and Figure 21.
More preferably say, when control unit 104 is detected when a synchronous situation has occurred (due to being ultimately written synchronization request to same Walk the last core 102 in buffer 108), control unit 104 makes last core 102 enter sleep state, is sent to for example, closing Be ultimately written the clock signal 122 of core 102, then simultaneously wake up all cores 102, for example, open be sent to all cores 102 when Arteries and veins signal 122.In this method, all cores 102 are all accurately waken up in the identical clock cycle (clock cycles), For example, opened its clock signal 122.For some operations, such as it is particularly advantageous (please join except wrong (debugging) Read the embodiment in Fig. 5), it is beneficial for accurately waking up core 102 in the same clock cycle.In one embodiment, it is non- Core 103 includes a single phase-locked loop (Phase-locked Loop, PLL), and it produces the clock signal for being supplied to core 102 122.In other embodiments, microprocessor 100 includes multiple phase-locked loops, and it, which is produced, is provided to the clock signal of core 102 122。
Control, state and configuration words
Fig. 2 is refer to, it shows a control word 202, a block diagram of the configuration words 252 of status word 242 and one.One core 102 A value of control word 202 is write to the synchronous buffer 108 of Fig. 1 control unit 104, to produce atom request (atomic Request), with request into sleep state and/or with all other particular subset contract of core 102 or one in microprocessor 100 Stepization (synchronization).One core 102 reads a value of the status word 242 that state buffer 106 is transmitted in the control unit 104, To determine status information described herein.What configuration buffer 112 was transmitted in the one core 102 reading control unit 104 should One value of configuration words 252, and the value is used, it is described as follows.
Control word 202 includes a wake events field 204, a synchronous power supply lock (Power of control group position 206 and one Gate, PG) position 208.The Synchronization Control field 206 includes various positions or sub- field, and it controls the sleep of core 102 and/or core 102 It is synchronous with other cores 102.Synchronization Control field 206 include one sleep position 212, a selective wake-up (SEL WAKE) position 214, One S positions 222, a C positions 224, a synchronous regime or C- states field 226, a core set field 228, a forcing synchronization position 232, One selectivity is synchronous to stop (kill) position 234, and core disables core position 236.Status word 242 include a wake events field 244, One minimum conventional C- states field 246 and an error code field 248.The configuration words 252 include each core of microprocessor 100 A 102 enable position 254, a local amount of crystals field 258 of nuclear volume field 256 and one.
The wake events field 204 of the control word 202 includes multiple positions corresponding to different event.As fruit stone 102 is set One in wake events field 204, when event occur to should position when, control unit 104 will wake up the core 102 (for example, opening Clock signal 122 is opened to the core 102).When the core 102 is synchronous with all other core specified in core set field 228 When, then a wake events occur.In one embodiment, core set field 228 may specify all cores 102 in microprocessor 100;Institute Have core 102 and instant (instant) core 102 share a cache memory (for example, a second level (L2) speed buffering and/ Or the third level (L3) speed buffering);In identical semiconductor crystal, all cores 102 are instant core 102 (refering to one described in Fig. 4 Polycrystal, multi-core microprocessor 100 embodiment an example);Or all cores 102 in other semiconductor crystals are instant Core 102.The core set 102 of one shared cache memory can be considered a chip (Slice).Other examples of other wake events Son includes, but are not limited to, and (de- is established in the establishment (assertion) or releasing of x86INTR, SMI, NMI, a STPCLK Assertion) and one it is internuclear interrupt (inter-core interrupt).When a core 102 is waken up, it can be read in state Wake events field 244 in word 242 is to determine the positive movable wake events.
When setting the PG positions 208 such as fruit stone 102, the control unit 104 makes core 102 be closed after entering sleep state to core 102 electric energy (for example, via the controlling electric energy signal 128).When control unit 104 then restores electricity to core 102, control Unit 104 removes PG positions 208.The use of PG positions 208 will be more fully described in below figure 11 to Figure 13.
If when the core 102 setting sleep position 212 or selective wake-up position 214, control unit 104 makes in the write-in of core 102 With specifying after the synchronous buffer 108 of the wake events of wake events field 204, core 102 is set to enter sleep state.The sleep position 212 and the mutual exclusion of selective wake-up position 214.When a synchronous situation occurs, the difference between them is taken with control unit 104 Action it is relevant.If core 102 sets sleep position 212, when a synchronous situation occurs, then control unit 104 will wake up all cores 102.If conversely, a core 102 sets selective wake-up position 214, when a synchronous situation occurs, control unit 104 will only wake up Synchronous situation is ultimately written to the core 102 of its synchronous buffer.
As fruit stone 102 do not put sleep position 212, when being not provided with selective wake-up position 214, although control unit 104 is not yet Core 102 can be made to enter sleep state, but when a synchronous situation occurs, control unit 104 will not wake up core 102.Control is single Member 104 will be arranged on position of one synchronous situation of instruction for the wake events field 204 of positive activity, therefore core 102 can be detected The synchronous situation has occurred and that.Many may specify that the wake events in the wake events field 204 can be also interrupted by the control An interrupt signal produced by unit 104 is to the source of core 102.If however, requiring, the microcode of core 102, which can cover interruption, to be come Source.In this way, when core 102 is waken up, the microcode can be read state buffer 106 determine a synchronous situation or a wake events or Whether both occur.
Such as the setting S of fruit stone 102 positions 222, it asks control unit 104 synchronous in a synchronous situation.The synchronous situation is in C It is designated in some combinations of position 224, synchronous situation or C- states field 226 and in core set field 228.If C positions 224 are set When putting, C- states field 226 specifies a C- state values;If C positions 224 are to remove, synchronous situation field 226 specifies a non-C- shapes State synchronous situation.More preferably say, the value of synchronous regime or C- states field 226 includes the bounded set of a nonnegative integer.One In embodiment, the synchronous situation or C- states field 226 are 4.When (clear) is is removed in C positions 224, synchronous situation hair Life exists:All cores 102 in one specific core set field 228 have been written into the set of S positions 222 and synchronous situation field 226 Identical value is into synchronous buffer 108.In one embodiment, the corresponding unique synchronous situation of the value of synchronous situation field 226, For example, synchronous situation various in the embodiment demonstrated described by lower section.When C positions 224 are set, synchronous situation occurs All cores 102 whether have been written into identical value in the C- states field 226, all in a specific core set field 228 The respective collection of S positions 222 is write to be bonded in synchronous buffer 108.In the case, control unit 104 distributes (post) the C- states Minimum write-in in field 226 is worth the minimum conventional C- states field 246 into the state buffer 106, the minimum write-in value It can be read by a core 102, for example, by the main core 102 in square 908 or by being ultimately written/selecting in square 1108 Core 102 is waken up to selecting property to be read.In one embodiment, if core 102 specifies a preset value in synchronous situation field 226 (for example, all set), this instruction control unit 104 are any synchronous with specified by other cores 102 to match instant core 102 Situation field 226 is worth.
If core 102 sets forcing synchronization position 232, control unit 104 will force all synchronization requests just carried out to be stood Match.
In general, if any core 102 because the wake events specified by wake events field 204 wake up when, Control unit 104 stops (kill) all synchronization requests just carried out by removing the S positions 222 in synchronous buffer 108.So And if when core 102 sets stop bit 234 in the selective synchronization, control unit 104 will stop only because of (asynchronous situation generation) The synchronization request that the core 102 that wake events are waken up just is being carried out.
If two or more core 102 asks synchronous under different synchronous situations, control unit 104 thinks that this pauses for one (deadlock) situation.If the value of S positions 222, one that a value is setting (set) is to remove the C of (clear) by two or more core 102 When different value in position 224 and synchronous situation field 226 is write in respective synchronous buffer 108, two or more core 102 then exists Asked under different synchronous situations synchronous.For example, if it is clear that the S positions 222, one that a value is setting (set) are worth by a core 102 Except the value 7 of the synchronous situation 226 of C positions 224 and one of (clear) is write into synchronous buffer 108, and another core 102 is worth one It is that the value 9 of 224 and one synchronous situation of C positions 226 for removing (clear) is write to synchronous temporary to set the S positions 222, one of (set) value When in device 108, control unit 104 then thinks that this is a stall condition.If in addition, C position 224 of the core 102 by a value for removing Write into its synchronous buffer 108 and another core 102 synchronous keeps in a value to set the C positions 224 of (set) to write to it In device 108, then control unit 104 thinks that this is a stall condition.In response to a stall condition, control unit 104 stops institute There is the synchronization request just carried out, and wake up all cores 102 in sleep mode.Control unit 104 also distributes (post) in shape Value in the error code field 248 of state buffer 106, its state buffer 106 are that can be read by core 102 to determine pause original Cause and the state buffer to take appropriate action.In one embodiment, error code 248 represents the synchronization that each core 102 is write Situation, the synchronous situation make each core decide whether to continue executing with the projected route of its action or be delayed to another core 102.Citing For, if a core 102 write a synchronous situation with perform a power management operations (for example, perform an x86MWAIT instruction) and Another core 102 writes a synchronous situation to perform cache management operation (for example, x86WBINVD is instructed), then plan is held The core 102 of the row MWAIT instruction is because MWAIT is a selectable operation, and WBINVD is an enforceable operation and is cancelled MWAIT instruction, to be delayed to another positive core 102 for performing WBINVD instructions.As another example, if a core 102 writes together Step situation is to perform one except wrong operation (for example, dump removes wrong state (Dump debug state)) and another core 102 write When one synchronous situation is to perform cache management operation (for example, WBINVD is instructed), then plan carries out WBINVD core 102 By storing WBINVD states, wait dump except mistaking raw and recovery WBINVD states and performing WBINVD instructions, to be delayed to Executive dumping is except wrong core 102.
Amount of crystals field 258 is zero in the embodiment of a single crystal.More than one individual crystal embodiment (for example, In Fig. 4), amount of crystals field 258 indicates which crystal is resident by the core 102 for reading configuration buffer 112.Citing comes Say, in the embodiment of one or two crystal, the crystal be designated as 0 and 1 and the amount of crystals field 258 have 0 or 1 value. In one embodiment, for example, fuse 114 is optionally blown to specify a crystal as 0 or 1.
Local nuclear volume field 256 indicates the number of the local crystal center to the positive core 102 for reading and configuring buffer 112 Amount.More preferably say, although with a sole disposition buffer 112 shared by all cores 102, but control unit 104 is known Which core of road 102 is just reading configuration buffer 112, and is provided correctly in local nuclear volume field 256 according to a reader Value.This causes the microcode of core 102 to know the local nuclear volume between other cores 102 in same crystal.In one embodiment, exist One multiplexer of the part of non-core 103 of microprocessor 100 selects appropriate value, and the appropriate value can be read based on core 102 Configure buffer 112 and recover in the local nuclear volume field 256 of configuration words 252.In one embodiment, optionally blow Fuse 114 operates the value for recovering local nuclear volume field 256 together with multiplexer.More preferably say, local nuclear volume column The value of position 256 is fixed independent, and its core 102 in crystal is workable, and enable position 254 as described below is signified Show.That is, when one or more cores 102 of the crystal are deactivated, the value of local nuclear volume field 256 remains solid It is fixed.In addition, the microcode of core 102 calculates the overall nuclear volume of core 102, the overall nuclear volume of the core 102 is related to configuration for one Value, its purposes are described in detail as follows.The nuclear volume of overall 100 overall core 102 of check figure amount instruction microprocessor.Core 102 is by making Its overall nuclear volume is calculated with the value of amount of crystals field 258.For example, in one embodiment, microprocessor 100 includes 8 cores 102, average mark has to two in the crystal of crystal value 0 and 1, in each crystal, the local recovery of nuclear volume field 256 1, 1st, 2 or 3 value;Recover the value of local nuclear volume field 256 plus 4 to calculate its overall nuclear volume in the core that crystal value is 1.
Each core 102 of microprocessor 100 has 252 corresponding enable position 254 of a configuration words, and configuration words 252 indicate the core Whether 102 be activated or disable.In fig. 2, enable position 254 is represented with enable position 254-x respectively, and wherein x is the correspondence core 102 Overall nuclear volume.Example in Fig. 2 assumes there is eight cores 102 in microprocessor 100, in Fig. 2 and Fig. 4 example, causes Whether core 102 (for example, core A) of the energy position 254-0 instructions with overall nuclear volume 0 is activated, and 254-1 instructions in enable position have whole Whether the core 102 (for example, core B) of body nuclear volume 1 is activated, core 102 (example of the 254-2 instructions in enable position with overall nuclear volume 2 Such as, core C) whether be activated etc..Therefore, by understanding overall nuclear volume, the microcode of a core 102 can be by determining in configuration words 252 Which core 102 for determining microprocessor 100 is deactivated and which core 102 is activated.More preferably say, if the core 102 is activated, Then an enable position 254 is set, if core 102 is deactivated, enable position 254 is eliminated.When the microprocessor 100 is set again Regularly, hardware is automatically filled (populate) the enable position 254.More preferably say, when microprocessor 100 has indicated one by manufacture Whether given core 102 is to enable, if be off, the hardware is optionally blown based on fuse 114 and is inserted enable Position 254.For example, if a given core 102 is tested and when to find it be failure, a fuse 114 can be blown To remove the enable position 254 of the core 102.In one embodiment, a fuse 114 being blown indicates a core 102 to disable, and Prevent the clock signal from the core 102 for being provided to deactivation.Each core 102 can write the deactivation core position 236 to its synchronization In buffer 108, to remove its enable position 254, more the details related to Figure 14 to Figure 16 will be described in as follows.More preferably Say, the execute instruction of core 102 will not be prevented by removing enable position 254, but can update the configuration buffer 112, also, the core 102 A different position (not shown) must be set, to prevent execute instruction of the core itself, for example, make its power supply be removed and/or Close its clock signal.For polycrystal configuration microprocessor 100 (for example, Fig. 4), it is micro- that the configuration buffer 112 includes this An enable position 254 of all cores 102 in processor 100, for example, all cores 102 not only can be the core 102 of the local crystal, and And it is alternatively the core 102 of the distal end crystal.More preferably say, in the microprocessor 100 of polycrystal configuration, when a core 102 is write When entering to its synchronous buffer 108, the value of synchronous buffer 108 is passed to the shadow synchronization buffer in corresponding another crystal 108 core 102 (referring to Fig. 4), wherein, if the deactivation core position 236 is set, a renewal will be caused to be transferred into distal end crystal Configure buffer 112 so that local and distal end crystal configuration buffer 112 all has identical value.
In one embodiment, configuration buffer 112 directly can not be write by a core 102.However, by a core 102 write to The value for causing local enable position 254 is transmitted to other crystal in a polycrystal microprocessor 100 by the configuration buffer 112 Configuration buffer 112 in, for example, such as the description in square 1406 in Figure 14.
Control unit
Fig. 3 is refer to, is to show a flow chart for describing the control unit 104.Flow starts from square 302.In square In 302, a core 102 writes a synchronization request, for example, writing a control word 202 to its synchronous buffer 108, the wherein synchronization Request is received by control unit 104.In the case where a polycrystal configures microprocessor 100 (for example, referring to Fig. 4), when one The shadow synchronization buffer 108 of control unit 104 receives has propagated synchronous buffer 108 by what other crystal 406 were transmitted Value, the control unit 104 effectively operate according to Fig. 3, for example, from its this earth's core 102, one of them connects when the control unit 104 A synchronization request (square 302) is received, except the control unit 104 makes core 102 enter sleep (for example, square 314) or wake up (in square 306,328 or 336) or core 102 is interrupted (in square 334) or prevented in the wake events of its local crystal 406 (square 326), also insert its local state buffer 106 (square 318).Flow proceeds to square 304.
In square 304, the control unit 104 checks the synchronous situation in square 302, to determine a pause (deadlock) whether situation has occurred, as described by figure 2 above.If so, then flow marches to square 306;Otherwise, flow is carried out To decision block 312.
In square 305, the control unit 104 is detected in the synchronous buffer 108 wake events field 204 of one of them A wake events generation the occurrence of (one except being detected in square 316 synchronous in addition to).Such as lower section square 326 Described in, control unit 104 can automatically prevent wake events.Control unit 104 can detect the wake events and occur as A synchronization request is write in square 302 during one event asynchronous (Event Asynchronous).Flow is also entered by square 305 Go to square 306.
In square 306, the control unit 104 inserts state buffer 106, stops the synchronization request just carried out, and Wake up the core 102 of any sleep.As described above, waking up sleep core 102 may include to recover its power.The core 102 then can be read The state buffer 106, particularly error code 248, with determine pause the reason for, and according to the collision sync request corresponding to it is excellent First sequential processes it, as described above.In addition, the control unit 104 stops all synchronization requests just carried out (for example, removing S positions 222 in the synchronous buffer 105 of each core 102), unless square 306 is by reaching after square 305 and the selectivity When stop bit 234 is set in synchronization, in this case, the control unit 104 can stop only to be waken up by the wake events The synchronization request that core 102 is just being carried out.If square 306 is by reaching after square 305, the column of wake events 244 can be read in the core 102 Position is to determine wake events occurred.If in addition, the wake events are an interruption sources for not covering (unmasked), control Unit 104 processed will produce an interrupt requests to the core 102 by the interrupt signal 124.Flow terminates in square 306.
In decision block 312, the control unit 104 determines whether sleep position 212 or selective wake-up position 214 are set It is fixed.If so, then flow is carried out to square 314;Otherwise, flow is carried out to decision block 316.
In square 314, control unit 104 makes the core 102 enter sleep state.As described above, a core 102 is set to enter to fall asleep Dormancy state may include to remove its power supply.In one embodiment, as an optimized example, even if the PG positions 208 are set, if This is the core 102 (for example, the generation that will cause synchronous situation) being ultimately written, and in square 314, the control unit 104 is not moved Except the power supply of the core 102, and because the control unit 104 backs up the core 102 that instant on is ultimately written in square 328, Therefore the selective wake-up position 214 is set.In one embodiment, the control unit 104 includes synchronous logic and sleep logic, Both are separated from each other, but communicate;In addition, each synchronous logic and sleep logic include the one of the synchronous buffer 108 Part.Advantageously, sleeping for the buffer 108 synchronous with this is written to of the synchronous logic part to the synchronous buffer 108 is write Dormancy logical gate is atom (atomic), i.e., indivisible.If that is, a part write-in occur when, its synchronization Logical gate and sleep logic part all ensure to occur.More preferably say, the piping obstruction of the core 102, do not allow any more Write-in occur, until its be guaranteed write-in untill two parts in the synchronous buffer 108 have all occurred.Write-in is together Step asks and the advantages of immediately entering sleep state to be that it does not need the core 102 (for example, microcode) continuously to operate to determine to be somebody's turn to do Whether synchronous situation has occurred and that.Due to that can save electric power and not consume other resources, such as bus and/or Memory bandwidth Width, therefore be highly profitable.It is worth noting that, in order to enter sleep state but without ask it is synchronous with other cores 102 (for example, Square 924 and square 1124), the core 102 can write S positions 222 be remove (Clear) and sleep position 212 be set (Set), A Sleep Request is referred to herein as, into the synchronous buffer 108;If specified one does not hide in wake events field 204 When the wake events covered occur (for example, square 305), but the occurrence of this core 102 1 is synchronous is not found (for example, square 316) when, in this case, the control unit 104 wakes up the core 102 (for example, square 306).Flow proceeds to decision block 316。
In decision block 316, the control unit 104 determines whether a synchronous situation occurs.If so, flow is carried out to side Block 318.As described above, when a synchronous situation only can be set in S positions 222.In one embodiment, the control unit 104 Using the enable position 254 in Fig. 2, it indicates which core 102 is activated in the microprocessor 100, and which core 102 is stopped With.The control unit 104 only looks for the core 102 being activated, to determine whether a synchronous situation occurs.One core 102 can be because of its quilt Test and find defective in the production time and be deactivated.Therefore, a fuse is blown so that the core 102 can not operate simultaneously Indicate that the core 102 is deactivated.One core 102 can be deactivated (for example, referring to Figure 15) because of the software that the core 102 is asked.Lift For example, when a user asks, BIOS writes a special module buffer (Model Specific Register, MSR) To ask the core 102 to be deactivated, itself (for example, by the deactivation core position 236) is stopped using to respond the core 102, and lead to Know that other cores 102 read other cores 102 and determine to disable the configuration buffer 112 of the core 102.One core 102 can also be via a microcode To repair (patch) (for example, referring to Figure 14), the microcode can be produced by blowing fuse 114 and/or from system storage (such as a FLASH memory) is loaded into.In addition to determining whether a synchronous situation occurs, the control unit 104 checks that this is strong Compel sync bit 232.If (set) is set, flow is then carried out to square 318.If the forcing synchronization position 232 is removing (clear) And one synchronous situation not yet occur, then flow ends in square 316.
In square 318, the control unit 104 inserts the state buffer 106.Explicitly, in the event of synchronous feelings When condition is the synchronization that all cores 102 ask a C- states, as described above, the control unit 104 inserts minimum conventional C- status bars Position 246.Flow is carried out to decision block 322.
In decision block 322, the control unit 104 checks selective wake-up (SEL WAKE) position 214.If the position is When setting (set), flow is carried out to square 326;Otherwise, flow is carried out to decision block 322.
In square 326, the control unit 104 prevents all other core 102 in addition to instant core (instant core) All wake events, the wherein instant core is is ultimately written synchronization request to the core of its synchronous buffer 108 in square 302 102, therefore the synchronous situation occurs.In one embodiment, if wake events to be prevented and other side are true (True) When, simply boolean (Boolean) AND operation has the wake-up feelings for being false (False) signal to the logic of the control unit 104 Condition.The purposes of all wake events of all cores is prevented to be described in more detail as follows, particularly Figure 11 to Figure 13.Flow is carried out To square 328.
In square 328, the control unit 104 only wakes up the instant core 102, but the not wake request synchronization is other Core.In addition, the synchronization request that the control unit 104 stops the instant core 102 and just carried out by removing the S positions 222, but do not stop The synchronization request that other cores 102 are just being carried out, for example, the S positions 222 for leaving other cores 102 are set.If it is therefore advantageous that work as For instant core 102 when writing another synchronization request after it is waken up, it will again result in the generation of synchronous situation (assuming that other The synchronization request of core 102 is not yet aborted), an example will be described in lower section Figure 12 and Figure 13.Flow ends at square 328.
In decision block 332, the control unit 104 checks the sleep position 212.If the position is sets (set), Flow proceeds to square 336;Otherwise, flow proceeds to square 334.
In square 334, the control unit 104 transmits an interrupt signal (sync break) to all cores 102.Figure 21 when Sequence figure is the example for illustrating a non-sleep synchronization request.Each core 102 can be read the wake events field 244 and detect one synchronously The occurrence of be interrupt the reason for.Flow has progressed to square 334, in the case, when core 102 writes its synchronization request When, core 102 selects not enter sleep state.Although such a situation is same when core 102 is obtained with entering sleep state Benefit (for example, waking up simultaneously), but it has and core 102 is ultimately written core 102 that it synchronously requires without simultaneously waiting In the case of wake-up, the potential advantages of instruction are continued with.Flow ends at square 334.
In square 336, the control unit 104 is waken up by all cores 102 simultaneously.In one embodiment, the control unit 104 are opened into the clock signal 122 of all cores 102 in the same clock cycle exactly.In another embodiment, the control list Member 104 opens the clock signal 122 to all cores 102 in a manner of one interlocks.That is, the control unit 104 is when opening Arteries and veins signal 122 is to each internuclear predetermined quantity (for example, clock order be ten or 100) for introducing a clock cycle.However, when Staggeredly (staggering) unlatching is considered in the present invention arteries and veins signal 122 simultaneously.To reduce by one when all cores 102 are waken up The possibility of power loss spike, it is beneficial that clock signal 122, which is staggeredly opened,.In still another embodiment, in order to reduce electricity When power consumes the possibility of spike, the control unit 104 is opened into the clock signal 122 of all cores 102 in the same clock cycle, But by being initially at offer clock signal 122 in the frequency of a reduction and improving under frequency to target frequency, continue absolutely one Performed in continuous (stuttering) or compacting (throttled) mode.In one embodiment, the synchronization request is as the core 102 The implementing result of micro-code instruction be issued, and the microcode is designed at least some synchronous situation values, and specifies this same It is unique to walk the microcode position of case values.For example, only one place includes a synchronous x requests in microcode, in microcode In only one place include a synchronous y request, the rest may be inferred.In these cases, because all cores 102 are in identical local quilt Wake up, it may be such that Microcode Design personnel design more efficiently and flawless procedure code, therefore it is beneficial to wake up simultaneously. In addition, there is mistake because of more nuclear interaction when trial is re-established and repaired, but mistake is then occurred without when single core is run Mistake, it is probably particularly advantageous to be waken up simultaneously for the purpose of except mistake.Fig. 5 and Fig. 6 is to show this example.In addition, the control Unit 104 stops all synchronization requests just carried out (for example, removing the S positions in the synchronous buffer 108 of each core 102 222).Flow ends at square 336.
One advantage of embodiment described herein can substantially reduce the quantity of the microcode in a microprocessor for it, because compared with Circulate (looping) or perform other inspections with synchronous how internuclear operation, the microcode in each core can be simply written together Step request, into sleep state, and it is aware of when that in microcode same place wakes up all cores.The synchronization request mechanism it is micro- Code purposes will be described in lower section.
Polycrystal microprocessor
Fig. 4 is refer to, is the block diagram for showing another embodiment microprocessor 100.Microprocessor 100 in Fig. 4 exists Many aspects are similar to Fig. 1 microprocessor 100, wherein a polycaryon processor and core 102 are similar.However, Fig. 4 embodiment It is polycrystal configuration.That is, the microprocessor 100 includes being arranged in a common packaging body (common package) And the multiple semiconductor crystal 406 to be communicated via a crystal internal bus 404 with another crystal.Fig. 4 embodiment includes two crystal 406, the crystal B 406B coupled labeled as crystal A406A and by bus between crystal 404.In addition, each crystal 406 includes Bus unit 402 between one crystal, bus unit 402 contacts respective crystal 406 to bus 404 between the crystal between crystal.More enter One step, each crystal 406 includes being coupled to the control unit between respective core 102 and crystal in the non-core 103 of bus unit 402 104.In the fig. 4 embodiment, crystal A406A includes four 102-core of core A 102A, core B 102B, core C 102C and core D 102D, wherein aforementioned four core 102 are coupled to the control unit A 104A for being coupled to bus unit A 402A between a crystal;Together Sample, crystal B 406B include four 102-core of core E 102E, core F 102F, core G102G and core H102H, wherein aforementioned four Core 102 is coupled to a control unit B104B for being coupled to bus unit B 402B between a crystal.Finally, each control unit 104 A synchronous buffer 108 of each core in the crystal 406 including itself is not only included in, including every in another crystal 406 yet One synchronous buffer 108 of one core, wherein, the synchronous buffer 108 in above-mentioned another crystal 406 is the shadow shown in Fig. 4 Buffer (Shadow register).Therefore, each control unit in embodiment illustrated in fig. 4 includes eight synchronous buffers 108, it is expressed as 108A, 108B, 108C, 108D, 108E, 108F, 108G and 108H.In control unit A104A, synchronous buffer 108E, 108F, 108G and 108H are shadow buffer, and in control unit B104B, synchronous buffer 108A, 108B, 108C, 108D are shadow buffer.
When a value is written to its synchronous buffer 108 by a core 102, the control unit in the crystal 406 of core 102 104, the bus 404 between bus unit 402 and crystal between crystal, it is temporary to write the value corresponding shadow into another crystal 406 Storage 108.If it is set in addition, disabling core position 236 when propagating in the value of shadow synchronization buffer 108, the control Unit 104 also updates the corresponding enable position 254 in buffer 112 is configured.In this way, even in microprocessor 100 caryogamy are put in the case of be dynamic change (for example, Figure 14 to Figure 16), one it is synchronous the occurrence of (including one across crystal (trans-die) generation of synchronous situation) it can be detected.In one embodiment, bus 404 is a relative low speeds between crystal Bus, and the propagation can be adopted as the clock cycle order of the core of a predetermined quantity 100, and each control unit 104 includes one Status mechanism, it takes the time of a predetermined quantity to detect the generation of the synchronous situation, and opens the clock signal to respective All cores 102 in crystal 406.More preferably say, control unit 104 start write-in be worth to another crystal 406 (for example, by Bus 404 between the crystal authorized), the control unit 104 in local crystal 406 (e.g., including writes the crystal of core 102 406) it is configured as delay and updates the local synchronization buffer untill the time of a predetermined quantity (for example, propagation time number The summation of detecting time quantity occurs with status mechanism synchronous situation for amount).In such a mode, the control list in two crystal The occurrence of member 104 while synchronous detecting one, while being opened into the clock pulse letter of all cores 102 in two crystal 406 Number.Only occur when trial is re-established and repaired because of more nuclear interaction, but the mistake not occurred when a single core is just run Mistake, for the purpose of except mistake for may be especially beneficial.Fig. 5 and Fig. 6 describes the embodiment possibly also with this functionality advantage.
Debugging operations
The core 102 of microprocessor 100 is configured to perform individually adjustment operation, such as instruction performs and data access Breakpoint (Breakpoint).In addition, microprocessor 100 is configured to perform to grasp across the debugging of core (trans-core) Make, for example, the debugging operations are related to the more than one core 102 of the microprocessor 100.
Referring to Fig. 5, it is to show that microprocessor 100 is operated with the flow chart of dump (dump) debugging (debug) information. The operation is described by the angle from a single core, but each core 102 operates common dump according to its description in microprocessor 100 The state of microprocessor 100.More specifically, Fig. 5 describes a core and receives request with the operation of dump Debugging message, its flow Start from square 502, and the operating process of other cores 102 starts from square 532.
In square 502, one of them request of reception one of core 102 is with dump Debugging message.More preferably say, above-mentioned adjustment letter Breath includes the state of the core 102 or one subset.More preferably say, adjustment information can pass through tune by dump to system storage or one The external bus of finishing equipment monitoring, seems a logic analyzer.Respond the request, the debugging dump information of the transmission of core 102 one to its Its core 102 simultaneously transmits other 102 1 internuclear interrupt signals of core.More preferably say, (example in a period of being deactivated is interrupted in this time Such as, the microcode does not allow to be interrupted in itself), core 102 prevents microcode from responding the request with dump Debugging message (in square 502 In), or the above-mentioned interrupt signal (in square 532) of response, and be maintained in microcode, untill square 528.In an embodiment In, core 102 only need to be in sleep state when it and be interrupted when being located at framework instruction boundaries.In one embodiment, it is described herein Various internuclear information (seem square 502 and it is other seem the information in square 702,1502,2606 and 3206) via Synchronous situation or C- states field 226 of the synchronous control word of buffer 108 are transmitted and received.In other embodiments, core Between information transmitted and received via the special random access memory 116 of non-core.Flow proceeds to square 504 from square 502.
In square 532, one of other cores 102 in square 502 (for example, receive debugging dump request core A core 102 outside 102) turn because the internuclear interrupt signal and information that are transmitted in square 502 are interrupted and receive the debugging Store up information.Although as described above, the flow in square 532 is as described by the angle of single core 102, each other cores 102 (for example, the not core 102 in square 502) is interrupted in square 532 and receives the information, and performs the step of square 504 to 528 Suddenly.Flow proceeds to square 504 by square 532.
In square 504, the synchronization request that core 102 writes a synchronous situation 1 (being denoted as SYNC 1 in Figure 5) is same to its Walk in buffer 108.Therefore, the control unit 104 makes core 102 enter sleep state.Flow proceeds to square 506.
In square 506, when all cores have been written into SYNC 1, core 102 is waken up by control unit 104.Flow is carried out To square 508.
In square 508, its state of the dump of core 102 is into memory.Flow proceeds to square 514.
In square 514, core 102 writes a SYNC 2, and it causes control unit 104 core 102 is entered sleep state.Stream Journey proceeds to square 516.
In square 516, when all cores have been written into SYNC 2, core 102 is waken up by control unit 104.Flow is carried out To square 518.
In square 518, the storage address of the dump of core 102 Debugging message in square 508 sets a flag (flag), reset (Reset) signal by one to maintain, then reset itself.Core 102 resets microcode, and the microcode detects the flag Mark and its state is loaded into by stored storage address again.Flow proceeds to square 524.
In square 524, core 102 writes a SYNC 3, and it causes control unit 104 core 102 is entered sleep state.Stream Journey proceeds to square 526.
In square 526, when all cores have been written into SYNC 3, core 102 is waken up by control unit 104.Flow is carried out To square 528.
In square 528, the core 102 is removed based on the state being loaded into again in square 518 and reset, and starts to carry Framework (for example, x86) is taken to instruct.Flow ends at square 528.
Fig. 6 is refer to, it is to show an operation example timing diagram according to microprocessor 100 in Fig. 5 flow charts.In this example In son, the configuration of microprocessor 100 has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.However, it is understood that It is that in other embodiments, microprocessor 100 may include the core 102 of varying number.In this timing diagram, the mistake of event-order serie Journey is as described below.
Core 0 receives a debugging dump request, and transmits a debugging dump information and interrupting information to core 1 and the (each party of core 2 Block 502) to be used as response.The core 0 is then written to a SYNC 1, and enters sleep state (each square 504).
Each core 1 and core 2 are finally by being interrupted and reading its information (each square 532) in its current task.As sound Should, each core 1 and core 2 write a SYNC 1 and enter sleep state (each square 504).As illustrated, each core write-in SYNC 1 time may be different, for example, because the instruction is carrying out when the interruption is established.
When all cores have been written into SYNC 1, control unit 104 wakes up all cores (each square 506) simultaneously.Each core Then its state of dump writes a SYNC 2 and enters sleep state (each square 514) to memory (each square 508). Need the time quantum of the dump state may be different;Therefore, may be different in each core write-in SYNC 2 time, as shown in the figure.
When all cores have been written into SYNC 2, control unit 104 wakes up all cores (each square 516) simultaneously.Each core Then reset itself and by being loaded into its state (each square 518) in memory again, write SYNC 3 and enter sleep shape State (each square 524).As illustrated, the time quantum that need to be reset and be loaded into state again may be different;Therefore, every One core write-in SYNC 3 time may be different.
When all cores have been written into SYNC 3, control unit 104 wakes up all cores (each square 526) simultaneously.Each core Then start in the time point extraction framework instruction (each square 528) being interrupted.
Tradition solution method of simultaneously operating between multiprocessor is to use software signal amount (semaphore).However, Traditional solution method shortcoming is that it can not provide time grade synchronously (Clock-level Synchronization).Herein The advantages of described embodiment is that control unit 104 can open clock signal 122 simultaneously to all core 102.
In method as described above, the engineer of an adjustment microprocessor 100 can configure one of core 102 with the cycle Property real estate biopsy look into time point, its to produce debugging dump request, for example, performed in the instruction of a predetermined quantity Afterwards.When microprocessor 100 operationally, engineer obtains all work on the external bus of microprocessor 100 in a record shelves It is dynamic.The record shelves part of time of origin is noticeable close to bus can be provided to a software simulator, and it simulates the microprocessor 100 to help engineer to debug.Simulator simulation is performed as the instruction indicated by each core 102, and simulates outside microprocessor The bus of device 100 uses the execution for noting down information.In one embodiment, the simulator of all cores 102 is opened from simultaneously by a replacement point It is dynamic.Therefore, all cores 102 of the microprocessor 100 actually stop replacement (for example, after SYNC 2) in the same time is With higher effect.In addition, by all other core 102 stopped its current task (for example, SYNC 1 it Before afterwards), when waiting its state of dump, by the dump of a core 102, its state will not perform debugging (for example, shared deposit with other cores Memory bus or speed buffering influence each other) procedure code and/or hardware interfere with each other, it, which can increase, regenerates mistake and sentences The possibility for its reason of breaking.Similarly, (for example, in SYNC 3 before all cores 102 have been completed to be loaded into its state again Afterwards), wait to start to extract framework instruction, the journey of debugging will not be performed with other cores by being loaded into state again by a core 102 Sequence code and/or hardware interfere with each other, and it can increase the possibility for regenerating mistake and judging its reason.
These benefits provide the advantages of more more than existing method, its existing method such as United States Patent (USP) US8,370,684, its from For all purposes collectively as with reference to this is incorporated in, it can not enjoy the benefit that can obtain the synchronization request core.
Speed buffering control operation
The core 102 of microprocessor 100 is configured to perform independent speed buffering control operation, seems in local high speed Buffer storage, for example, the high-speed buffer do not shared by two or more cores 102.In addition, microprocessor 100 is configured To perform as across the speed buffering control operation of core (Trans-core), for example, with 100 more than one core of microprocessor 102 is related, and for example, because it is related to a shared cache memory 119.
Fig. 7 A~7B are referred to, it is to show microprocessor 100 to perform the flow of across core speed buffering control operation Figure.Fig. 7 A~7B embodiment describes microprocessor 100 and how to perform an x86 frameworks to write back invalid buffering (Write Back And Invalidate Cache, WBINVD) instruction.The core 102 of one WBINVD instruction instruction execute instructions is write back in microprocessor All modification rows to system storage and make cache memory fail in the cache memory of device 100, or empty (Flush).WBINVD instructions also indicate that the core 102 issues the special bus cycles with by outside any cache memory Directly refer in microprocessor 100, to write back the data that it has been changed, and make above-mentioned data failure.Aforesaid operations are single with one Described by the angle of one core, but each core 102 of microprocessor 100 writes back to have changed and delayed at a high speed jointly according to this specification operation Breast the tape (Modified cache line) and make the cache memory of microprocessor 100 invalid.Further illustrate, scheme 7A~7B describes the operation that a core runs into WBINVD instructions, and its flow starts from square 702, and the flow of other cores 102 is opened Start from square 752.
In block 702, one of core 102 runs into WBINVD instructions.As response, core 102 transmits a WBINVD Command information is to other cores 102 and transmits an internuclear interrupt signal to above-mentioned other cores 102.More preferably say, until flow is entered Row is to before square 748/749, and core 102 is in a period of in the time, interrupt signal is deactivated (for example, the microcode does not allow itself Be interrupted), prevent response (in block 702) of the microcode to be instructed as WBINVD, or using as the interrupt signal (in square In 752) response, and maintain in microcode.Flow proceeds to square 704 from square 702.
In square 752, one of other cores 102 (for example, except running into WBINVD instructions in block 702 A core outside core 102) it is interrupted due to the internuclear interrupt signal transmitted in block 702 and receives the WBINVD and refers to Make information.As described above, although flow is each other (examples of core 102 as described by the angle of single core 102 in square 752 Such as, it is not core 102 in block 702) it is interrupted in square 752 and receives the information, and square 704 is performed to square 749 the step of.Flow proceeds to square 704 by square 752.
In square 704, the synchronization request that the core 102 writes a synchronous situation 4 (is denoted as SYNC in Fig. 7 A~7B 4) into its synchronous buffer 108.Therefore, control unit 104 makes core 102 enter sleep state.Flow proceeds to square 706.
In block 706, when all cores 102 have been written into SYNC 4, the core 102 is waken up by control unit 104.Flow Proceed to square 708.
In block 708, core 102 writes back and causes local cache memory failure, for example, not by core 102 and its The 1st grade of shared (Level-1, L1) cache memory of its core 102.Flow proceeds to frame 714.
In square 714, core 102 writes a SYNC 5, and it causes control unit 104 core 102 is entered sleep state.Stream Journey proceeds to square 716.
In square 716, when all cores 102 have been written into SYNC 5, core 102 is waken up by control unit 104.Flow is entered Row arrives decision block 717.
In decision block 717, core 102 judges whether it is the core 102 for running into WBINVD instructions in block 702 (core 102 with receiving the WBINVD command informations in square 752 contrasts).If so, then flow proceeds to square 718; Otherwise, flow proceeds to square 724.
In square 718, core 102 writes back and shared scratch pad memory 119 is failed.In one embodiment, microprocessor 100 include multiple chips multiple cores but and not all core in, the core 102 of microprocessor 100 shares a cache memory, As described above.In this embodiment, it is performed similar to the intermediary operation (not shown) in square 717 to square 726, its To be write back by the execution of one of core 102 in the wafer and being made shared buffer out of memory, and other (multiple) of the chip Core is returned to similar to the sleep state in square 724 to wait untill the cache miss.Flow proceeds to Square 724.
In square 724, core 102 writes a SYNC 6, and it causes control unit 104 core 102 is entered sleep state.Stream Journey proceeds to square 726.
In square 726, when all cores 102 have been written into SYNC 6, core 102 is waken up by control unit 104.Flow is entered Row arrives decision block 727.
In decision block 727, core 102 judge its whether be run into block 702 WBINVD instruction core 102 (with The core 102 that the WBINVD command informations are received in square 752 contrasts).If so, then flow proceeds to square 728;It is no Then, flow proceeds to square 744.
In square 728, core 102 issues the specific bus cycles to cause outside high-speed buffer to be written back into and make outside High-speed buffer fails.Flow proceeds to square 744.
In square 744, a SYNC 13 is write, it causes control unit 104 core 102 is entered sleep state.Flow is entered Row arrives square 746.
In square 746, when all cores 102 have been written into SYNC 13, core 102 is waken up by control unit 104.Flow Proceed to decision block 747.
In decision block 747, core 102 judge its whether be run into block 702 WBINVD instruction core 102 (with The core 102 that the WBINVD command informations are received in square 752 contrasts).If so, then flow proceeds to square 748;It is no Then, flow proceeds to square 749.
In square 748, core 102 completes WBINVD instructions, and it includes the WBINVD instructions of resignation (retire), and can wrap Include the ownership for abandoning a hardware semaphore (see Figure 20).Flow ends at square 748.
In square 749, before core 102 is interrupted in square 752, core 102 recovers to continue its positive execution in square 749 Task 102.Flow ends at square 749.
Refering to Fig. 8, it is to show to be schemed according to the time sequential routine of the microprocessor 100 of Fig. 7 A~7B flow charts.In this example In, the configuration of microprocessor 100 has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.However, it is understood that It is that in other embodiments, microprocessor 100 may include the core 102 of varying number.
Core 0 runs into a WBINVD and instructs and respond one WBINVD command informations of transmission, and interrupts core 1 and (each square of core 2 702).Core 0 then writes a SYNC 4 and enters sleep state (each square 704).
Each core 1 and core 2 are finally interrupted from its current task and read the information (each square 752).As sound Should, each core 1 and core 2 write a SYNC 4 and enter sleep state (each square 704).As illustrated, each core write-in SYNC 4 time may be different.
When all cores have been written into SYNC 4, control unit 104 wakes up all cores (each square 706) simultaneously.It is each Core then writes back and makes its specific cache miss (each square 708), writes SYNC 5 and enters sleep shape State (each square 714).It need to write back and make the time quantum of cache miss may be different, therefore, be write in each core The time for entering SYNC 5 may be different, as shown in the figure.
When all cores have been written into SYNC 5, control unit 104 wakes up all cores (each square 716) simultaneously.Only run into The core of WBINVD instructions writes back and shared cache memory 119 is failed (each square 718), and the write-in of all cores SYNC 6 simultaneously enters sleep state (each square 724).Because an only core writes back and loses shared cache memory 119 Effect, therefore each core write-in SYNC 6 time may be different.
When all cores have been written into SYNC 6, control unit 104 wakes up all cores (each square 726) simultaneously.Only run into The core of WBINVD instructions completes WBINVD instructions (each square 748), and all other core recovers the processing before interrupting.
Although being described it should be appreciated that speed buffering control instruction is the embodiment that an x86WBINVD is instructed, Other embodiments assume that synchronization request is used to perform other speed buffering instructions.For example, microprocessor 100 can perform class As action so that high speed buffer data (in square 708 and 718) need not be write back and perform an x86INVD instruction and simply High-speed buffer is set to fail.For as yet another example, speed buffering control instruction can be by instructing than x86 framework less identical Collection framework obtains.
Power management operations
It is configured to perform the operation of each power reduction in the core 102 of microprocessor 100, for example, but be not limited to, Stop execute instruction, request control unit 104 stops transmission clock signal to core 102, request control unit 104 by removal core 102 power supply, write back and make local (for example, unshared) cache miss of core 102 and store the state of core 102 To an external memory storage, such as special random access memory 116.When the power that the executed of a core 102 one or more cores are specified subtracts When operating less, it has been enter into " core " C- states (also referred to as a core idle state or core sleep state).In one embodiment, C- state values can be generally corresponding to known ACPI (Advanced Configuration and Power Interface, ACPI) specification processor state, but may also comprise finer granularity (Granularity).In general, one Core 102 will enter a core C- states to respond the request from aforesaid operations system.For example, x86 frameworks monitoring waits (MWAIT) instruction be a power management instruction, its provide one prompting, i.e. a target C- states, to execute instruction core 102 with permit Perhaps microprocessor 100 enters an optimized state, seems lower-wattage consume state.In the case of a MWAIT instruction, mesh Mark C- states are exclusive (proprietary) and non-ACPI C- states.Core C- states 0 (C0) correspond to the operation shape of core 102 State and the corresponding activity gradually decreased of the gradual increased value of C- states or responsive state (such as C1, C2, C3 state).One is gradual The response of reduction or active state refer to configuration or the operation shape that more power are saved relative to a more multi-activity or responsive state State, or for some reason and the relative configuration for reducing response or mode of operation (for example, with a longer wake-up delay, compared with It is few to enable completely).The example that one core 102 may save power operation is the execution of halt instruction, stops transmission clock signal, drop The power supply of the part (for example, functional unit and/or local high-speed buffer) or whole core of low-voltage, and/or removal core.
In addition, microprocessor 100 is configured to perform the power reduction operations across core.Across core power reduction operations involve Or influence multiple cores 102 of microprocessor 100.For example, sharing cache memory 119 can be big and disappear relatively Consume substantial amounts of power.Therefore, significant power saves the clock pulse letter that shared cache memory 119 can be sent to by removing Number and/or power supply reach.However, in order to remove to the clock signal and/or power supply of shared cache memory 119, institute The core 102 for having shared cache memory must agree to so that the uniformity of data is maintained.Embodiment considers micro- place Managing device 100 includes the related resource of other shared power supplys, seems shared clock pulse and power supply.In one embodiment, microprocessor 100 It is coupled to the System on chip group including a Memory Controller, peripheral controllers and/or power source management controller.In other realities Apply in example, one or more controllers are integrated into microprocessor 100.System power saving can be by the notification controller of microprocessor 100 Controller is set to take the action of power saving to reach.For example, microprocessor 100 can make the height of microprocessor with notification controller Fast cache invalidation is simultaneously closed, so that it need not be investigated.
In addition to the concept of a core C- states, in general the C- states with one " encapsulation " (are also claimed microprocessor 100 For an encapsulation idle state or encapsulation sleep state).Encapsulation C- states correspond to minimum (for example, peak power consumption) of core 102 Common core C- states (for example, referring to the square 318 of the field 246 and Fig. 3 in Fig. 2).However, except the specific power of core subtracts Few operation is outer, and encapsulation C- states are related to performing one or more microprocessors 100 across core power reduction operations.With encapsulating C- shapes Related across the core power-save operation example of state include closing a phase-locked loop for producing clock signal (Phase-locked-loop, PLL), and the shared cache memory 119 is emptied, and stops its clock pulse and/or power supply, it makes memory/outside control Device avoids the local for investigating microprocessor 100 from sharing cache memory.Other examples is change voltage, frequency and/or total Line clock pulse than, reduce the size of cache memory, such as shared cache memory 119, and run with the speed of half Shared cache memory 119.
In many cases, operating system is by effectively to perform the instruction in independent core 102, therefore can make individually Core enters sleep state (for example, to a core C- states), but without directly make the entrance of microprocessor 100 sleep state (for example, To encapsulation C- states) mode.Valuably, side of the core 102 of the microprocessor 100 described in embodiment in control unit 104 Help down and work with working in coordination, to detect when all cores 102 have been enter into core C- states and prepare to make the power-save operation generation across core.
Referring to Fig. 9, it is to show that microprocessor 100 enters the operational flowchart of low-power encapsulation C- states.Fig. 9's Embodiment is described microprocessor 100 and is coupled to a chipset and the example performed using MWAIT instruction.However, it is understood that It is that in other embodiments, operating system is instructed using other power managements and main core 102 is with being integrated into microprocessor Controller in 100 communicates, and different shakes hands (Handshake) agreement to describe using one.
This operation is described with the angle of a single core, but each core 102 of the microprocessor 100 can be potentially encountered MWAIT instruction simultaneously makes microprocessor 100 enter optimum state jointly according to this specification operation.Flow starts from square 902.
In square 902, a core 102 runs into a MWAIT instruction for being used to specify target C- states, is denoted as in fig.9 Cx, wherein x are a nonnegative integral values.Flow proceeds to square 904.
In square 904, one C positions 224 of the write-in of core 102 are gathered and the value of a C- states field 226 is that x (is denoted as in fig.9 SYNC Cx) synchronization request to its synchronous buffer 108.In addition, synchronization request specifies core in its wake events field 204 102 are waken up in all wake events.Therefore, control unit 104 makes core 102 enter sleep state.More preferably say, core 102 Before SYNC Cx are write, core 102 first writes back and made the local cache memory failure that it writes.The flow side of proceeding to Block 906.
In square 906, when all cores 102 have been written into a SYNC Cx signals, the controlled unit 104 of core 102 wakes up. As described above, the x values write by other cores 102 may be different, and control unit 104 sends minimum conventional C- state values to shape In the minimum conventional C- states field 246 of the status word 242 of state buffer 106 (each square 318).Before square 906, and core 102 when being in sleep state, and it can be waken up by a wake events, seem an interrupt signal (for example, square 305 and 306).More Specifically, but do not ensure that the operating system will perform the MWAIT instruction of all cores 102, it can allow to send out in a wake events Before one of raw (for example, interruption) instruction core 102 effectively cancels MWAIT instruction, microprocessor 100 performs and encapsulation C- The related power-save operation of state.However, in square 906, once core 102 is waken up, (the example in a period of clock pulse is interrupted and disabled Such as, microcode does not allow itself to be interrupted), core 102 (in fact, all core 102) is due to the MWAIT of (in square 902) Instruction still performs microcode, and maintains in microcode, untill square 924.In other words, although small part in all cores 102 MWAIT instruction has been received be can be at into sleep state, single core 102 in sleep state, but as micro- place of an encapsulation Reason device 100 would not instruct that the chip collection, and it is ready for entering an encapsulation sleep state.However, once all cores 102 have agreed into Enter an encapsulation sleep state, it is effectively indicated by the generation of the synchronous situation in square 906, and main core 102 is allowed to and crystalline substance The encapsulation sleep state Handshake Protocol of piece group completion one (for example, square 908,909 and following 921), and be not interrupted and do not have and appoint What its core 102 is interrupted.Flow proceeds to decision block 907.
In decision block 907, core 102 judge its whether be microprocessor 100 main core 102.More preferably say, if sentencing When breaking that it is BSP in reseting time, a core 102 is main core 102.If the core is main core, flow proceeds to square 908; Otherwise, flow proceeds to square 914.
In square 908, main core 102 writes back and shared cache memory 119 is failed, then with that can take Appropriate action is communicated with reducing the chip collection of power consumption.For example, due to being in encapsulation C- states when microprocessor 100 When, Memory Controller and/or peripheral control unit are all maintained to fail, therefore Memory Controller and/or peripheral control unit can be kept away Exempt to detect the local of microprocessor 100 and shared cache memory.Illustrate as another example, the chipset can transmission signal To microprocessor 100 make microprocessor 100 take power-save operation (for example, establishment x86-style STPCLK as described below, SLP, DPSLP, NAP, VRDSLP signal).More preferably say, core 102 carries out power based on the minimum conventional value of C- states field 246 The communication of management information.In one embodiment, core 102 issues an I/O and reads the bus cycles to an electricity for providing chipset correlation Source control information, for example, the I/O address of encapsulation C- state values.Flow proceeds to square 909.
In square 909, main core 102 waits chipset to establish (assert) STPCLK signal.More preferably say, if When STPCLK signal is not established after the bright clock cycle of a predetermined number, control unit 104 is stopping its synchronization just carried out After request, this situation is detected, all cores 102 is waken up and the mistake is indicated in error code field 248.Flow proceeds to square 914。
In square 914, the core 102 writes a SYNC 14.In one embodiment, the synchronization request is in its wake events The core 102 is specified not to be waken up in any wake events in field 204.Therefore, control unit 104 makes core 102 enter sleep State.Flow proceeds to square 916.
In square 916, when all cores 102 have write a SYNC 14, core 102 is waken up by control unit 104.Stream Journey proceeds to decision block 919.
In decision block 919, core 102 judge its whether be microprocessor 100 main core 102.If so, then before flow Enter square 921;Otherwise, flow proceeds to square 924.
In square 921, main core 102 sends a stopping in the bus of microprocessor 100 allows (grant) cycle with logical Knowing the chipset, it may be taken across core (for example, package perimeter) and the overall related power-save operation of microprocessor 100, seem to keep away Exempt from investigation, the removal bus clock pulse (for example, x86- type BCLK) to microprocessor 100 of the cache memory of microprocessor 100, And other signals (for example, x86- types SLP, DPSLP, NAP, VRDSLP) in the bus are established, so that microprocessor 100 removes Clock pulse and/or power supply to microprocessor 100 various pieces.Although being described in, embodiments herein relate to microprocessor 100 and a Handshake Protocol (in square 908) between the chip collection related to I/O readings, STPCLK establishment is (in square In 909), and stop the issue (in square 921) for allowing the cycle, it has to x86 architecture systems, and history is related, Ying Keli Solution, other embodiments assume with it is other with different agreement instruction set architecture system it is related, but can also save electric energy, Improve performance and/or reduce complexity.Flow proceeds to square 924.
In square 924, core 102 writes a Sleep Request (for example, sleep position 212 is setting (set) and S positions 222 are clear Except the Sleep Request of (clear)) to synchronous buffer 108.In addition, synchronization request indicates core 102 in its wake events field 204 Only in the non-establishment wake events of STPCLK (wakeup event of the de-assertion of STPCLK, i.e. release true Vertical STPCLK wake events) in be waken up.Therefore, control unit 104 makes core 102 enter sleep state.Flow ends at Square 924.
Referring to Fig. 10, it is to show the timing diagram that embodiment is operated according to Fig. 9 flow charts microprocessor 100.In this example In son, the configuration of microprocessor 100 has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.However, it is understood that It is that in other embodiments, microprocessor 100 may include the core 102 of varying number.
Core 0 runs into a MWAIT instruction (MWAIT C4) (each square 902) for specifying C- states 4.Core 0 then writes one SYNC C4 simultaneously enter sleep state (each square 904).Core 1 runs into a MWAIT instruction (MWAIT C3) for specifying C- states 3 (each square 902).Core 1 then writes a SYNC C3 and enters sleep state (each square 904).Core 2 runs into one and specifies C- shapes The MWAIT instruction (MWAIT C2) (each square 902) of state 2.Core 2 then writes a SYNC C2 and enters sleep state (each party Block 904).As illustrated, the time in each core write-in SYNC Cx may be different.In fact, it occurs in some other events Before, such as one interrupt, one or more cores are not likely to be encountered a MWAIT instruction.
When all cores have been written into SYNC Cx, control unit 104 wakes up all cores (each square 906) simultaneously.Mainly Core then sends I/O and reads the bus cycles (each square 908), and waits STPCLK establishment (per square 909).All core A SYNC 14 is write, and enters sleep state (each square 914).It is shared slow at a high speed due to only having main core to empty (Flush) Memory 119 is rushed, I/O is sent and reads the bus cycles and wait STPCLK to establish, therefore each core write-in SYNC 14 time can Can be different, as shown in the figure.In fact, main core can be sequentially written in SYNC 14 after other cores with hundreds of microseconds.
When all cores write SYNC 14, control unit 104 wakes up all cores (each square 916) simultaneously.Only one is main Core, which sends stopping, allowing the cycle (Stop grant cycle) (each square 921).All cores, which are written in non-establish of STPCLK, to be believed Sleep Request for being waited in number (~STPCLK) simultaneously enters sleep state (each square 924).Stop because only main core is sent Only allow the cycle, therefore the time of each core write-in Sleep Request may be different, as shown in the figure.
When STPCLK signal is released from establishing (de-asserted), control unit 104 wakes up all cores.
Can be observed by Figure 10, when core 0 performs Handshake Protocol, core 1 and core 2 valuably can one section of dormancy it is effective when Between.It is noted, however, that microprocessor 100 need to be waken up the required time generally and dormancy from encapsulation sleep state Time span is directly proportional (for example, great power is saved in sleep state).Therefore, it is relatively long in encapsulation sleep state In the case of (or the single sleep state time of core 102 is longer even in), it would be desirable to further reduce what is waken up Generation and/or the required time waken up related to Handshake Protocol.Figure 11 describes the Handshake Protocol that single core 102 is handled, and another Core 102 keeps a dormant embodiment.In addition, it can further pass through according to power in Figure 11 embodiment, is saved Reduce the quantity of core 102 for responding a wake events and being waken up and obtain.
Figure 11 is referred to, it is that microprocessor 100 according to another embodiment of the present invention enters low-power encapsulation C- shapes The operational flowchart of state.Figure 11 embodiment is coupled to the example that MWAIT instruction performs in chipset using microprocessor 100 and entered Row explanation.However, it should be appreciated that in other embodiments, operating system is instructed using other power managements, and finally Synchronous core 102 is with being integrated into microprocessor 100, and the controller of use and the different Handshake Protocols of description communicates.
Figure 11 embodiment is similar to Fig. 9 embodiment in some respects.However, in existing operations system request microprocessor Device 100 enters low-down power rating and tolerated in the environment of delay associated therewith, Figure 11 embodiment be designed in Save potential bigger power.More specifically, Figure 11 embodiment is advantageous to control to the power of core and when necessary, such as handled During interruption, an only core in core is waken up.Embodiment considers to support the behaviour of two patterns in Fig. 9 and Figure 11 in the microprocessor 100 Make.In addition, pattern is configurable, either manufacture (for example, by fuse 114) and/or control via software or by Microprocessor 100 automatically determines according to as the specific C- states specified by MWAIT instruction.Flow starts from square 1102.
In square 1102, core 102 runs into the MWAIT instruction (MWAIT Cx) for specifying target C- states, and it is being schemed Cx is expressed as in 11, flow proceeds to square 1104.
In square 1104, core 102 write a C positions 224 be set and the value of a C- states field 226 be x (its in fig. 11 Be denoted as SYNC Cx) synchronization request into its synchronous buffer 108.Synchronization request is also provided with selective wake-up (SEL WAKE) position 214 and PG positions 208.In addition, synchronization request indicates core 102 in all wake events in its wake events field 204 In be waken up, outside STPCLK establishment and STPCLK non-establishment (~STPCLK, i.e. STPCLK releasing is established). (more preferably saying there are other wake events, when starting such as AP, the synchronization request specifies core 102 not to be waken up).Therefore, control is single Member 104 makes core 102 enter sleep state, and it includes preventing to provide power to core 102 because PG positions 208 are set.In addition, core 102 write back and make local cache memory invalid, and storage (the preferably special arbitrary access before synchronization request is write Memory 116) its core 102 state.When subsequent core 102 is waken up (for example, in square 1137,1132 or 1106), core 102 (for example, from PRAM 116) is recovered into its state.As described above, especially with respect to Fig. 3, when last core 102 write-in one has During the synchronization request that selective wake-up position 214 is set, in addition to being ultimately written core 102, the control unit 104 can be automatically prevented from institute There are all wake events (each square 326) of core 102.Flow proceeds to square 1106.
In square 1106, when all cores 102 have been written into a SYNC Cx, control unit 104 wakes up what is be ultimately written Core 102.As described above, control unit 104 maintains the S positions 222 of other cores 102 to set, finally write even if control unit 104 wakes up The core 102 that enters simultaneously removes S positions.Before square 1106, when core 102 is in sleep state, it can be called out by a wake events Wake up, such as one interrupts.However, when core 102 is waken up in square 1106, core 102 is still held because of MWAIT instruction (square 1102) Row microcode, and in a period of interruption is deactivated (for example, the microcode does not allow itself to be interrupted) be maintained in microcode, until Untill square 1124.In other words, although having been received by a MWAIT instruction no more than all cores 102 with into sleep state, only singly Only core 102 can dormancy, but do not indicate the chipset as the microprocessor 100 of encapsulation it be ready for entering an encapsulation sleep State.However, when all cores 102 have agreed to enter an encapsulation sleep state, it passes through the synchronous regime in square 1106 Indicated by generation, the core 102 (core 102 being ultimately written, it causes synchronous situation) being waken up in square 906 is allowed to Encapsulation sleep state Handshake Protocol (for example, square 1108,1109 and 1121 as follows) is completed without quilt with chipset Interrupt, and be interrupted without any other core 102.Flow proceeds to square 1108.
In square 1108, core 102 writes back and shared cache memory 119 is failed, and is then communicated with chipset, It may take appropriate action, to reduce power consumption.Flow proceeds to square 1109.
In square 1109, core 102 waits chipset to establish STPCLK signal.More preferably say, if STPCLK signal When not established after a clock cycle predetermined quantity, control unit 104 detects this situation, and please terminating its synchronization just carried out All cores 102 are waken up after asking, and the mistake is indicated in error code field 248.Flow proceeds to square 1121.
In square 1121, core 102 sends the chipset in stopping permission cycle to bus.Flow proceeds to square 1124。
In square 1124, core 102 writes a Sleep Request, for example, being setting (set) and S positions with sleep position 212 222 be removing (clear) and PG positions 208 are to set (set), into synchronous buffer 108.In addition, synchronization request wakes up at it The core 102 is specified only to be waken up in the wake events for establishing STPCLK are released in event field 204.Therefore, control unit 104 Core 102 is made to enter sleep state.Flow proceeds to square 1132.
In square 1132, control unit 104 detects the non-establishments of STPCLK and wakes up core 102.It should be noted that previously control Unit 104 processed wakes up core 102, and control unit 104 does not limit power supply to core 102 yet.It is advantageous that now core 102 be it is unique just In the core of running, this provides the chance of core 102 so that it performs any action that must be performed, without other cores 102 Running.Flow proceeds to square 1134.
In square 1134, core 102 writes into a buffer (not shown) for control unit 104 that to be opened in its with solution right Answer the wake events of specified each other cores 102 in the wake events field 204 of synchronous buffer 108.The flow side of proceeding to Block 1136.
In square 1136, core 102 handles any wake events for just carrying out specifying the core 102.For example, it is real one Apply in example, including the system of microprocessor 100 allows the interruption of oriented (both directed) (for example, pointing to microprocessor The interruption of 100 1 particular cores) and it is non-to (non-directed) interruption (for example, when microprocessor 100 selects, can be by micro- Interruption handled by any core 102 of processor 100).One non-is commonly known as one " low priority interrupt " to the example of interruption. In one embodiment, microprocessor 100 be preferably directed to it is non-to interrupt to square 1132 releasing establish STPCLK in be waken up Single core 102, because it has been waken up, and can handle the interruption with it is expected other cores 102 without it is any just carrying out call out The event of waking up, therefore can continue to sleep and limit power supply.Flow returns to square 1104.
When wake events are released from (unblcked) in square 1134, except the core being waken up in square 1132 Outside 102, the wake events as fruit stone 102 is not specified are being carried out, then are advantageous to core 102 and keep sleep state, and Power supply is limited in each square 1104.However, when wake events are released from square 1134, if a wake-up specified Event is just handled by core 102, then core will not limit power supply (un-power-gated), and be waken up by control unit 104.In this feelings Under condition, different flows starts from the square 1137 in Figure 11.
In square 1137, after wake events are released from square 1134, another core 102 is (for example, except in square The core 102 outside wake events core 102 is released in 1134) it is waken up.Other cores 102, which are handled, any just to carry out and points to other cores 102 wake events, for example, processing one is interrupted.Flow proceeds to square 1104 from square 1137.
Figure 12 is referred to, it is to show the timing diagram that an example is operated according to the microprocessor 100 of Figure 11 flow charts.Herein In example, the configuration of microprocessor 100 has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But, it should thus be appreciated that , in other embodiments, microprocessor 100 may include the core 102 of varying number.
Core 0 runs into a MWAIT instruction (MWAIT C7) (each square 1102) for specifying C- states 7.In this example, C- State 7 allows to limit power supply.Core 0 is then written to a selective wake-up position 214 to set (set) (" selection as shown in Figure 12 Property wake up ") and PG positions 208 for set (set) SYNC C7, and enter sleep state and limitation power supply (each square 1104). Core 1 runs into one and specifies the MWAIT instruction (each square 1102) that C- states are 7.Core 1 is then written to selective wake-up position 214 (set) and PG positions 208 are set to set the SYNC C7 of (set), and enter sleep state and limitation power supply (each square 1104).Core 2 runs into one and specifies the MWAIT instruction (each square 1102) that C- states are 7.Core 2 is then written to calls out with selectivity Wake up the SYNC C7 that position 214 is setting (set) and PG positions 208 are setting (set), and (each into sleep state and limitation power supply Square 1104).(however, in the optimal embodiment of square 314 1 is described in, the core being ultimately written can not limit power supply).Such as Shown in figure, the write-in of each core may be different with SYNC C7 time.
When the core write-in being ultimately written is to set the SYNC C7 of (set) with selective wake-up position 214, the control list Member 104 stops (block off) all wake events (each square 326) for being ultimately written core, is core 2 in Figure 12 example. In addition, control unit 104 only wakes up the core (each square 1106) being ultimately written, because of other core prolonged sleeps and power supply is limited, And core 2 performs Handshake Protocol with chipset, therefore power can be saved.Core 2 then sends I/O and reads bus cycles (each square 1108), and STPCLK establishment (each square 1109) is waited.In response to STPCLK, core 2, which sends stopping, allowing the cycle (every One square 1121), and it is Sleep Request and the entrance for setting (set) to write one to have the wait PG positions 208 in STPCLK releasings Sleep state and limitation power (each square 1124).Above-mentioned core with dormancy and can limit the one relatively long time of power.
When STPCLK can not be established, control unit 104 only wakes up core 2 (each square 1132).In the example in figure 12, The chipset can not establish STPCLK to respond a non-reception to interruption, and it is forwarded to microprocessor 100.Microprocessor 100 Indicate non-to interrupting to core 2, it saves power because other cores keep sleep state and limitation power supply.Core releases other cores The wake events of (each square 1134) simultaneously service non-to interruption (each square 1136).Core 2, which then re-writes one, has choosing Selecting property wakes up the SYNC C7 that position 214 is setting (set) and PG positions 208 are setting (set), and enters sleep state and limitation electricity Source (each square 1104).
As the SYNC C7 that the write-in of core 2 is setting (set) with selective wake-up position 214 and PG positions 208 are setting (set) When, because the synchronization request of other cores is still being carried out, removed for example, the S positions 222 of other cores are not waken up by core 2, therefore The control unit 104 stops (block off) wake events of all cores in addition to core 2, for example, it is (each to be ultimately written core Square 326).In addition, control unit 104 only wakes up core 102 (each square 1106).Core 2 then sends I/O and reads the bus cycles (each square 1108), and wait STPCLK establishment (each square 1109).In response to STPCLK, core 2, which sends stopping, being permitted Perhaps cycle (each square 1121), and it is setting (set) to write one to have the PG positions 208 waited in STPCLK can not be established Sleep Request, and enter sleep state and limitation power (each square 1124).
When STPCLK can not be established, control unit 104 only wakes up core 2 (each square 1132).In the example in figure 12, STPCLK non-is released from establishing because other to interruption.Therefore, microprocessor 100 indicates the interruption to core 2, and this can save work( Rate.Core 2 release the wake events (each square 1134) of other cores again and service this it is non-to interrupt (each square 1136).Core 2 Then writing one again has the SYNC C7 that selective wake-up position 214 is setting (set) and PG positions 208 are setting (set), goes forward side by side Enter sleep state and limitation power (each square 1104).
This period lasts is for quite a long time, i.e., only non-to be generated to interruption.Figure 13 is the instruction one of display one except most The example of different IPs interrupt processing outside core is write afterwards.
It can be known by comparing Figure 10 and Figure 12, embodiment in fig. 12 advantageously, is slept once core 102 initially enters Dormancy state (is write after SYNC C7) in the example in figure 12, and an only core 102 is waken up to perform association of shaking hands with chipset again View, and other cores 102 keep sleep, if core 102 is in a considerably long sleep state, can be one it is notable the advantages of.Work( Rate saves possible highly significant, particularly handles workload very for single core 102 in systems in operating system identification In the case of small.
Furthermore it is advantageous that be indicated to other cores 102 as long as no wake events, then an only core 102 be waken up (with It is non-to event to provide service, seems a low priority interrupt).Come again, can if core 102 is in a considerably long sleep state There can be significant advantage.Except relatively infrequent non-to interruption, such as USB is interrupted, especially in systems without effective In the case of load, it can be significant that power, which is saved,.Further, even if a wake events are indicated to another core When 102 (for example, interrupt operation system is indicated to a single core 102, seems operating system timer interruption), embodiment can be favourable The single core 102 of ground switching at runtime, its execute encapsulation sleep state agreement and service are non-to wake events, as shown in figure 13, so as to Enjoy the benefit for waking up only one single core 102.
Figure 13 is referred to, it is to show the timing diagram that an example is operated according to the microprocessor 100 of Figure 11 flow charts.Figure 13 Example it is similar to Figure 12 example in many aspects.However, in the first example that STPCLK is released from establishment, the interruption is One points to the interruption (rather than one in Figure 12 examples is non-to interruption) of core 1.Therefore, control unit 104 wakes up the (each party of core 2 Block 1132), and (each square 1134) is then released by core 2 in wake events and wakes up core 1 afterwards.Core 2 then writes one again to be had Selective wake-up position 214 is setting (set) and PG positions 208 are the SYNC C7 for setting (set), and enters sleep state and limitation Power (each square 1104).
(each block 1137) is interrupted in the service-orientation of core 1.Then write-in has selective wake-up position 214 to set to core 1 again (set) and PG positions 208 are put to set the SYNC C7 of (set), and enters sleep state and limits power (each square 1104) and exist In this example, core 2 writes its SYNC C7 before core 1 writes SYNC C7.Therefore, although core 0 is when it writes initial SYNC C7 Still there is its S position 222set, but the S positions 222 when it is waken up of core 1 are still eliminated.Therefore, after core 2 is releasing wake events When writing SYNC C7, not last core writes synchronous C7 requests, on the contrary, core 1 writes synchronous C7 requests as last core.
When the write-in of core 1 one has the SYNC that selective wake-up position 214 is setting (set) and PG positions 208 are setting (set) During C7, because the synchronization request of core 0 is still carrying out (for example, it is not removed by the wake-up of core 1 and core 2), and core 2 is (herein In example) have been written into SYNC 14 and ask, so the control unit 104 stops the wake events of all cores in addition to core 1, for example, It is ultimately written core (each square 326).In addition, control unit 104 only wakes up core 1 (each square 1106).Core 1 then sends I/ O reads the bus cycles (each square 1108), and waits STPCLK to establish (each square 1109).In response to STPCLK, core 1 Sending stopping allows the cycle (each square 1121), and it is to set that write-in, which releases the PG positions 208 established with wait STPCLK, (set) Sleep Request, and enter sleep state and limitation power (each square 1124).
When STPCLK is released from establishing, control unit 104 only wakes up core 1 (each square 1132).In Figure 12 example In, STPCLK non-releases establishment due to one to interruption;Therefore, microprocessor 100 indicates non-to interrupting to core 1 that it can be saved Power.The non-period lasts to interruption are handled for quite a long time from core 1, i.e. only non-to be generated to interruption.In such a mode In, microprocessor 100 can be such that nearest interruption is instructed to save power advantageous by instruction is non-to interruption to core 102, its It is shown in Figure 13 related to switching to different IPs example.Core 1 releases wake events (each square of other cores again 1134) and service non-to interruption (each square 1136).Then write-in one has selective wake-up position 214 to set to core 1 again (set) and PG positions 208 is set the SYNC C7 of (set), and enter sleep state and limitation power (each square 1104).
It is other although being described it should be appreciated that power management instruction is the embodiment that an x86MWAIT is instructed The embodiment that synchronization request is used to perform power management instruction can be considered.For example, microprocessor 100 is executable Similar operations are to respond by one group of reading with the related default I/O port address of different C- states.As another example, work( Rate management instruction can be obtained by the instruction set architecture different from x86 frameworks.
The dynamic of polycaryon processor reconfigures
Configuration of each core 102 of microprocessor 100 based on each core 102 of microprocessor 100 produces the related value of configuration. More preferably say, the microcode of each core 102 is produced, stored and using the related value of configuration.The production of embodiment description configuration correlation Life can be dynamically and beneficial, and it is described as follows.The example of configuration correlation includes, but are not limited to herein below.
Each core 102 produces an overall nuclear volume related to above-mentioned Fig. 2.Core with being resident crystal 406 only in core 102 The local nuclear volume 256 of 102 related cores 102 is compared, and overall nuclear volume refers to that cores 102 all to microprocessor 100 are related The nuclear volume of overall core 102.In one embodiment, core 102 produces overall nuclear volume, and its overall nuclear volume is the number of crystals of core 102 Amount 258 and the product of the quantity of core 102 and its summation of local nuclear volume 256 of each crystal, it is as follows:
Overall nuclear volume=(nuclear volume of number of crystals × each crystal)+local nuclear volume.
Each core 102 also produces a virtual nuclear volume.The virtual nuclear volume is that overall nuclear volume is subtracted with one less than i.e. When core 102 overall nuclear volume overall nuclear volume the quantity of deactivation core 102.Therefore, in all cores of the microprocessor 100 In the case of 102 is available, overall nuclear volume and virtual nuclear volume are identicals.If however, one or more cores 102 disable, have it is scarce When falling into, the virtual nuclear volume of a core 102 likely differs from its overall nuclear volume.In one embodiment, it is empty to insert it for each core 102 Nucleoid quantity to its corresponding APIC ID buffer APIC ID fields.However, according to another embodiment (for example, Figure 22 and Figure 23), then it is not belonging to such a situation.In addition, in one embodiment, operating system may be updated in APIC ID buffers APIC ID。
Each core 102 also produces a BSP flags, and it indicates whether the core 102 is BSP.In one embodiment, in general (for example, when in fig 23 " when all core BSP " function disables) core 102 specifies this as homing sequence processor It itself is an application processor (Application that (Bootstrap Processor, BSP) and each other cores 102, which are specified, Processor, AP).After reseting, AP cores 102 are initialized, and are subsequently entered sleep state and are waited BSP notices to start to read Take simultaneously execute instruction.On the contrary, after the initialization of AP cores 102, BSP cores 102 immediately begin to read and execution system firmware Instruction, for example, BIOS start codes, it is initializing system (for example, checking system storage and the whether normal work of peripheral equipment Make and initialize and/or configure them) and operating system is guided, for example, operating system (for example, being loaded into from disk) is loaded into, And control is transferred to operating system.Before operating system is guided, BSP decision systems are configured (for example, at core 102 or logic Manage the quantity of device in systems), and be stored in memory, so that operating system can be read after system configuration startup. In operating system after being directed, instruction AP cores 102 start to read and perform operating system instruction.In one embodiment, typically For (for example, in Figure 22 and Figure 23 " modification BSP " and " BSP " of all cores function, when disabling respectively), if a core 102 When its virtual nuclear volume is 0, then this is specified as BSP, and all other core 102 specifies this as an AP cores 102.Most preferably, The BSP flag bits that one core 102 is inserted in its BSP flag relevant configuration value to its corresponding APIC APIC substrate address registers. According in an embodiment, as described above, BSP is the main core 102 in square 907 and 919, it performs Fig. 9 encapsulation sleep shape State Handshake Protocol.
Each core 102 also produces the APIC base values for inserting APIC substrate buffers.APIC substrates address is based on core 102 APIC ID and produce.In one embodiment, the APIC bases in APIC substrate address registers may be updated in operating system Bottom address.
Each core 102 also produces a crystal and mainly indicated, it indicates whether the core 102 is the crystal 406 for including the core 102 Main core 102.
Each core 102 also produces a chip and mainly indicated, it indicates whether the core 102 is to include the instant chip of core 102 Main core, wherein assuming that the microprocessor 100 is configured with chip, it is described in detail as above.
Each core 102 calculates configuration correlation and simultaneously operates with the configuration correlation so that what it is including microprocessor 100 is System normal operation.For example, system indicates interrupt requests to core 102 based on its related APIC ID.APIC ID determine core Which interrupt requests 102 should respond.Further illustrate, each interrupt requests with including mesh identifier, an and core 102 is only Responded when identifier matches with the APIC ID of core 102 when mesh an interrupt requests (if or the interrupt requests identifier be one to Indicate that it is the particular value of all cores 102 of a request).As another example, each core 102 must be known by whether it is BSP, with It is performed initial BIOS code and guide operating system, and perform encapsulation sleep state as described in Figure 9 in one embodiment Handshake Protocol.Embodiment is described as follows and (refers to Figure 22 and 23), and wherein BSP flags and APIC ID can be because of specific purpose and by it Made an amendment in normal value, seem to be used to test and/or debug.
Figure 14 is referred to, it is to show the flow chart that the dynamic of microprocessor 100 reconfigures.In Figure 14 explanation, with For Fig. 4 polycrystal microprocessor 100 as reference, it includes two crystal 406 and eight cores 102.However, it is understood that It is that described dynamic, which reconfigures can be used, has different configuration of microprocessor 100, i.e., with more than two crystal or list Individual crystal, and more or less than eight cores 102 but at least two cores 102.This operation is described by angle from a single core, but micro- Each core 102 of processor 100 with overall dynamics operates according to the description and reconfigures the microprocessor 100.Flow is opened Start from square 1402.
In square 1402, microprocessor 100 is reset, and quantity of the hardware of microprocessor 100 based on available core 102 And reside at the amount of crystals of core 104 and insert suitable value into the configuration buffer 112 of each core 102.In one embodiment, Local nuclear volume 256 and amount of crystals 258 are hard-wired (hardwired).As described above, hardware can decide whether by fuse 114 states blown or do not blown enable or disabled a core 102.Flow proceeds to square 1404.
In square 1404, core 102 is by reading configuration words 252 in configuration buffer 112.Core 102 is next based in square The value of configuration words 252 read in 1402 produces its correlation.In the case where polycrystal microprocessor 100 configures, in square Caused configuration correlation will not consider the core 102 of other crystal 406 in 1404.However, in square 1414 and 1424 (with And square 1524 in Figure 15) caused by configuration correlation will consider the core 102 of other crystal 406, as described below.Flow is carried out To square 1406.
In square 1406, core 102 is worth the enable position 254 of this earth's core 102 in buffer 112 is locally configured to be passed Cast to distal end crystal 406 and configure enable position 254 corresponding to buffer 112.For example, Fig. 4 configuration is refer to, one in crystal Core 102 in A 406A makes and configuration buffer 112 center A, B, C and D (this earth's core) in crystal A 406A (local crystal) Related enable position 254 is transmitted to and center A, B, C and D phase of configuration buffer 112 in crystal B 406B (distal end crystal) The enable position 254 of pass.On the contrary, the core 102 in crystal B 406B makes and the configuration in crystal B 406B (local crystal) Buffer 112 center E, F, G and H (this earth's core) related enable position 254 is transmitted to and at crystal A 406A (distal end crystal) The related enable positions 254 of configuration buffer 112 center E, F, G and H.In one embodiment, core 102 is locally configured by write-in Buffer 112 propagates to other crystal 406.More preferably say, being write by core 102 to buffer 112 is locally configured makes local match somebody with somebody Put buffer not change, but local control unit 104 can be caused to propagate local enable position 254 and be worth to distal end crystal 406 In.Flow is carried out to square 1408.
In square 1408, core 102 write a synchronous situation 8 (being denoted as SYNC 8 in fig. 8) synchronization request to its In synchronous buffer 108.Therefore, control unit 104 makes core 102 enter sleep state.Flow proceeds to square 1412.
In square 1412, when all available cores 102 have been written into one in the core set specified by core set field 228 During SYNC 8, control unit 104 wakes up core 102.It is worth noting that, in the situation of the microprocessor 100 of a polycrystal 406 configuration Under, synchronous situation occurs be that a polycrystal synchronous situation occurs.That is, control unit 104 will wait with wake up (or Core 102 be not provided with sleep position 212 so as to determine it is sleepless in the case of interrupt) core 102, until in core set field 228, (it can Core 102 to be included in crystal 406) write its synchronization request untill.Flow proceeds to square 1414.
In square 1414, core 102 again reads off configuration buffer 112 and based on including transmitting enable by distal end crystal Newly value produces its configuration correlation to the configuration words 252 of the right value of position 254, and flow proceeds to decision block 1416.
In decision block 1416, core 102 determines whether it should disable itself.In one embodiment, fuse 114 because The microcode reads (before decision block 1416) in its reset process, is blown with indicating core 102 to disable itself, therefore Core 102 determines that it need to disable itself.Fuse 114 can be blown during or after the manufacture of microprocessor 100.Another In embodiment, the value of fuse 114 of renewal, which can be scanned up to, to be kept in buffer, as described above, and scanned value instruction The core 102 should be deactivated.Figure 15 is that description core 102 judges that it should be stopped another embodiment used by different modes.If When core 102 determines that it should be deactivated, flow proceeds to square 1417;Otherwise, flow proceeds to square 1418.
In square 1417, the write-in of core 102 disables core position 236 so as to be removed in itself list by available core 102, example Such as, its corresponding enable position 254 in the configuration words 252 of configuration buffer 112 is removed.Hereafter, core 102 can prevent from itself from holding Any more instructions of row, more preferably by setting one or more positions come to close its clock signal, and remove its power supply.Flow Terminate in square 1417.
In square 1418, core 102 writes the synchronization request of a synchronous situation 9 (being denoted as SYNC 9 in fig. 14) to same Walk in buffer 108.Therefore, control unit 104 makes core 102 enter sleep state.Flow proceeds to square 1422.
In square 1422, when all cores 102 enabled have been written into a SYNC 9, core 102 is called out by control unit 104 Wake up.In addition, in the case where the microprocessor 100 of a polycrystal 406 configures, synchronous situation occurs to be based in configuration buffer 112 In updated value may be a quartz lock situation occur.Furthermore when control unit 104 determines whether a synchronous situation occurs When, control unit 104 will exclude to consider to disable the core 102 of itself in square 1417.It is described in more detail, in a situation In, before not disabling the core 102 of itself and synchronous buffer 108 write in square 1417, all other core 102 (except Outside the core 102 for disabling itself) one SYNC 9 of write-in, then stops when not disabling the core 102 of itself in square 1417 When setting the synchronous buffer 108 of write-in with core position, control unit 104 will detect the generation of synchronous situation (in square 316).When When control unit 104 determines that synchronous situation has occurred and that because of (clear) for disabling the enable position 254 of core 102 as removing, control Unit 104, which does not consider further that, disables core 102.That is, enabling core 102 due to all, but do not include disabling core 102, write Enter SYNC 9, no matter disable whether core 102 has been written into SYNC 9, therefore control unit 104 judges that synchronous situation has occurred and that. Flow proceeds to square 1424.
In square 1424, if a core 102 is deactivated by operation of another core 102 in square 1417, core 102 Configuration buffer 112 is again read off, and the new value of configuration words 252 reflects a deactivation core 102.Core 102 is then according to configuration words 252 new value produces it and configures correlation again, and it is similar to the mode in square 1414.One presence 102 for disabling core can Some configuration correlations can be caused to be different from the caused new value in square 1414.For example, as described above, virtual check figure Amount, APIC ID, BSP flags, BSP plots, the main chip of predominant crystal can be because changing disabling the presence of core 102.Next implementation In example, after producing and configuring correlation, core 102 one of them (for example, BSP) by all cores of microprocessor 100 102 it is overall one A little configuration correlation write-in special random access memory 116 of non-core, make it then can be read by all cores 102.For example, In one embodiment, overall configuration correlation is read by core 102 to perform framework instruction (for example, x86CPUID is instructed), The relevant Global Information of its instruction request microprocessor 100, seem the quantity of core 102 of microprocessor 100.Flow proceeds to judgement Square 1426.
In square 1426, core 102, which removes, to be reset and starts to extract framework instruction.Flow ends at square 1426.
Figure 15 is referred to, it is to show the flow chart according to the dynamic of microprocessor 100 reconfigures in another embodiment. In Figure 15 explanation, using Fig. 4 polycrystal microprocessor 100 as reference, it includes two crystal 406 and eight cores 102.So And, it should thus be appreciated that, described dynamic, which reconfigures can be used, has different configuration of microprocessor 100, that is, has more In two crystal or single crystal, and more or less than eight cores 102 but at least two cores 102.This operation is from a single core Described by angle, but each core 102 of microprocessor 100 with overall dynamics operates according to the description and reconfigures micro- place Manage device 100.Further illustrate, Figure 15 describes a core 102 and runs into the operation that core disables instruction, and its flow starts from square 1502, and another core 102 operates, its operating process starts from square 1532.
In square 1502, one of core 102 runs into one to indicate that core 102 disables the instruction of itself.It is real one Apply in example, the instruction instructs for an x86WRMSR.As response, the transmission of core 102 one reconfigures information to other cores 102 and passed Send one internuclear interrupt signal.More preferably say, interrupted in the time in a period of being deactivated (for example, the microcode does not allow its own Be interrupted), core 102 prevents microcode to respond the instruction, to disable itself (in square 1502), or respond the interruption ( In square 1532), and maintain in microcode, untill square 1526.Flow proceeds to square 1504 by square 1502.
In square 1532, one of other cores 102 are (for example, disable the core of instruction except being run into square 1502 Core outside 102) it is interrupted and receives by the internuclear interruption transmitted in square 1502 and reconfigures information.As above institute State, although the flow in square 1532, as described by the angle of a single core 102, each other cores 102 are (for example, not Core 102 in square 1502) it is interrupted in square 1532 and receives the information and perform the step in square 1504 to 1526 Suddenly.Flow proceeds to square 1504 by square 1532.
In square 1504, the one synchronous synchronization request for asking condition 10 (being denoted as SYNC 10 in fig.15) of the write-in of core 102 is extremely In its synchronous buffer 108.Therefore, control unit 104 makes core 102 enter sleep state.Flow proceeds to square 1506.
In square 1506, when all available cores 102 have been written into a SYNC 10, core 102 is called out by control unit 102 Wake up.It is worth noting that, in the case where the microprocessor 100 of a polycrystal 406 configures, it can be a polycrystal that synchronous situation, which occurs, Synchronous situation occurs.That is, control unit 104 will wait with wake up (or core 102 not yet determines entrance it is dormant In the case of interrupt) core 102, until specified in core set field 228 (it can be included in the core 102 in crystal 406) and can Enable (it is as indicated by enable position) core 102 write its synchronization request untill.Flow proceeds to decision block 1508.
In decision block 1508, core 102 judges whether it is one to be instructed in square 1502 to disable itself Core 102.If so, flow proceeds to square 1517;Otherwise, flow proceeds to square 1518.
In square 1517, the write-in of core 102 disables core position 236 so as to be removed in itself list by available core 102, example Such as, its corresponding enable position 254 in the configuration words 252 of configuration buffer 112 is removed.Hereafter, core 102 can prevent from itself from holding Any more instructions of row, more preferably by setting one or more positions come to close its clock signal, and remove its power supply.Flow Terminate in square 1517.
In square 1518, core 102 writes the synchronization request of a synchronous situation 11 (being denoted as SYNC 11 in fig.15) extremely In synchronous buffer 108.Therefore, control unit 104 makes core 102 enter sleep state.Flow proceeds to square 1522.
In square 1522, when all cores 102 enabled have been written into a SYNC 11, core 102 is by the institute of control unit 104 Wake up.In addition, in the case where the microprocessor 100 of a polycrystal 406 configures, synchronous situation occurs to be based in configuration buffer Updated value in 112 may be that a polycrystal synchronous situation occurs.Furthermore when control unit 104 determines that a synchronous situation is During no generation, control unit 104 will exclude to consider to disable the core 102 of itself in square 1517.It is described in more detail, one In situation, before not disabling the core 102 of itself and synchronous buffer 108 write in square 1517, all other core 102 One SYNC 11 of (in addition to disabling the core 102 of itself) write-in, then when because disabling the enable position 254 of core 102 as removing (clear) when determining whether synchronous situation has occurred and that, core 102 is disabled because control unit 104 does not consider further that, therefore ought not stop When writing synchronous buffer 108 in square 1517 with the core 102 of itself, control unit 104 will detect the hair of synchronous situation Life is (in square 316) (referring to Figure 16).That is, because all cores 102 that enable have been written into a SYNC 11, no matter stop SYNC 11 whether is had been written into core 102, control unit 104 then judges that synchronous situation has occurred and that.Flow proceeds to square 1524。
In square 1524, core 102 reads configuration buffer 112, and its configuration words 252, which will reflect in square 1517, to be stopped Deactivation core 102.The core 102 then produces the related value of its configuration according to the new value of configuration words 252.More preferably say, in side It is performed by system firmware (for example, BIOS is set) that instruction is disabled in block 1502, and after the deactivation of core 102, system firmware is held The restarting of row system, for example, after in square 1526.During restarting, microprocessor 100 can be carried out not It is same as in square 1524 previously configuring operation caused by correlation.For example, interior BSP can be for one not during restarting It is same as producing the core 102 before configuration correlation.Illustrate as yet another example, before operating system is guided by BSP determine with Store to memory so that the system configuration information that can read of operating system is (for example, core 102 and logic processor in systems Quantity) can differ.Illustrate as another example, the APIC ID of the core 102 still used are different from before producing configuration correlation APIC ID, in the case, operating system will indicate interrupt requests and core 102 will be responded and produced different from previously configuration correlation Raw interrupt requests.Illustrate as yet another example, the master of Fig. 9 encapsulation sleep state Handshake Protocols is performed in square 907 and 919 Core 102 is wanted to be different from previously core 102 caused by configuration correlation for one.Flow proceeds to decision block 1526.
In square 1526, core 102 recovers the task of its execution before being interrupted in square 1526.The flow side of ending at Block 1526.
The microprocessor 100 described herein that dynamically reconfigures can be used in various applications.For example, move State, which reconfigures, to be used to test and/or simulate in the development process of microprocessor 100, and/or in on-the-spot test.Separately Outside, a user may wonder the performance and/or work(using only system during the subset of a core 102 one specific application program of operation The total amount of rate consumption.In one embodiment, when a core 102 is deactivated, it can make its clock pulse stop and/or remove power supply, with It is set to there is no consumption power supply.In addition, in the system of high reliability, each core 102 can periodically check other cores 102 and core 102 selected by particular core 102 whether break down, the core of non-failure can disabling faulty core 102 and make remaining Core 102 performs dynamically to be reconfigured as described above.In this embodiment, control word 202 may include an additional field, and it makes Write-in core 102, which specifies the core 102 to be deactivated and changes operation described in fig.15, causes a core can in square 1517 Disable the core 102 for being different from core 102 itself.
Figure 16 is referred to, it is to show the timing diagram that an example is operated according to the microprocessor 100 of Figure 15 flow charts.Herein In example, the configuration of microprocessor 100 has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.But, it should thus be appreciated that , in other embodiments, microprocessor 100 may include the core 102 of varying number and can be that single crystal or polycrystal are micro- Processor 100.In this timing diagram, the sequential of event is advanced downwards.
Core 1 run into a deactivation itself instruction and reconfigure information as response transmission one and interrupt core 0 and core 2 (each square 1502).Core 1 is then written to SYNC 10 and enters sleep state (each square 1504).
Each core 0 and core 2 are finally interrupted from its current task and read the information (each square 1532).As Response, the write-in SYNC 10 of each core 0 and core 2 simultaneously enter sleep state (each square 1504).As illustrated, each core Write possible different with SYNC 10 time.For example, due to the delay of the instruction, therefore the instruction is established when interruption When and perform.
When all cores 102 write SYNC 10, control unit 104 wakes up all cores (each square 1506) simultaneously.Core 0 And core 2 then determines that it will not be disabled itself (each decision block 1508), and write a SYNC 11 and enter sleep state (each square 1518).However, because core 1 determines that it disables itself, so it, which writes it, disables (each square of core position 236 1517).In this example, core 1 writes it after core 0 and core 2 write respective SYNC 11 and disables core position 236, as shown in the figure. However, because control unit 104 determines that S positions 222 are set for the core 102 that each enable position 254 is set, therefore control single Member 104 is detected the synchronous situation and occurred.That is, even if the S positions 222 of core 1 are not provided with, its enable position 254 is in square 1517 The synchronous buffer 108 of core 1 is eliminated when writing.
When all available cores have been written into SYNC 11, control unit 104 wakes up all cores (each square 1522) simultaneously. As described above, in the case of a polycrystal microprocessor 100, core position 236 is disabled when core 1 writes it, and locally control Unit 104 removes the local enable position 254 of core 1 respectively, and it is brilliant that local control unit 104 also propagates local enable position 254 to distal end Body 406.Therefore, Remote Control Unit 104 also detects the generation of synchronous regime and to wake up its crystal 406 simultaneously all available Core.Core 0 and core 2 then produce it based on the value for having updated configuration buffer 112 and configure correlation (each square 1524), and extensive Activity (each square 1526) before its multiple interruption.
Hardware semaphore (HARDWARE SEMAPHORE)
Figure 17 is refer to, it is shown in a block diagram of hardware semaphore 118 in Fig. 1.Hardware semaphore 118 includes one Possess position (owned bit) 1702, owner position (owner bit) 1704 and a state machine 1706, its state machine 1706 to Renewal possesses position 1702 and owner position 1704 to respond the hardware semaphore 118 for being read and being write by core 102.More preferably say, In order to recognize the hardware semaphore 118 that core possesses at present, the quantity of owner position 1704 is log with the microprocessor 100 that 2 be bottom The quantity of core 102 of configuration.In another embodiment, owner position 1704 is included corresponding to each core 102 1 of microprocessor 100 Position.It is worth noting that, although one group possesses position 1702, owner position 1704 and state machine 1706 and is described with a hardware signal Amount 118 is realized, but microprocessor 100 may include multiple hardware semaphores 118, and each of which hardware semaphore 118 all includes upper The a set of hardware stated.More preferably say, need the exclusive operation for reading shared resource to perform, run in each core 102 Microcode reads and writes the hardware semaphore 118 to obtain the ownership by 102 shared resources of core, and it is described in detail in down In the example of side.The microcode can join each multiple hardware semaphores 118 shared resource ownership different from microprocessor 100 It is tied.More preferably say, the preset address in a nand architecture address space of core 102 by core 102 of hardware semaphore 118 It is middle to read and write.The nand architecture address space can only be read by the microcode of a core 102, but can not be directly by user's journey Sequence code reads (for example, programmed instruction of x86 frameworks).Possess position 1702 and owner position to update hardware semaphore 118 1704 operation of state machine 1706 is described as in Figure 18 and 19, and the use of hardware semaphore 118 also describes afterwards.
Figure 18 is referred to, it is shown when a core 102 reads the operational flowchart of hardware semaphore 118.Flow starts from Square 1802.
In square 1802, a core 102, core x is denoted as, reads hardware semaphore 118.As described above, more preferably say, core 102 microcode reads the presumptive address that the hardware semaphore 118 is resided in nand architecture address space.Flow proceeds to judgement Square 1804.
In decision block 1804, state machine 1706 checks the owner position 1704, to determine whether core 102 is hardware letter Number amount 118 the owner.If so, then flow proceeds to square 1808;Otherwise, flow proceeds to square 1806.
In square 1806, the hardware semaphore 118 returns and reads the null value in core 102 to indicate the core 102 not Possess hardware semaphore 118, flow terminates in square 1806.
In square 1808, the hardware semaphore 118 returns and reads the value in core 102, to indicate that the core 102 possesses firmly Part semaphore 118, flow terminate in square 1808.
As described above, microprocessor 100 may include multiple hardware semaphores 118.In one embodiment, microprocessor 100 Including 16 hardware semaphores 118, and when a core 102 reads presumptive address, it receives one 16 bit data values, its each One of them different hardware semaphore 118 of corresponding 16 hardware semaphores 118, and indicate the core 102 of the reading presumptive address Hardware semaphore 118 corresponding to whether possessing.
Figure 19 is referred to, it is shown when the operational flowchart of a core 102 write-in hardware semaphore 118.Flow starts from Square 1902.
In square 1902, a core 102, core x is denoted as, hardware semaphore 118 is write, for example, as described above non- The preset address of framework.Flow proceeds to decision block 1804.
In decision block 1904, state machine 1706 check this possess position 1702, with determine hardware semaphore 118 whether be Any core 102 possesses or not occupied (free).If being possessed, flow proceeds to decision block 1914;Otherwise, flow Proceed to decision block 1906.
In decision block 1906, state machine 1706 checks the value of write-in.If the value is 1, it represents that core 102 is intended to obtain firmly The ownership of part semaphore 118, then flow proceed to square 1908.If however, the value is 0, it represents 102 hardware to be abandoned of core The ownership of semaphore 118, then flow proceed to square 1912.
In square 1908, the renewal of state machine 1706 possesses position 1702 to 1, and sets owner position 1704 to indicate that core x shows In the hardware semaphore 118 possessed.Flow terminates in square 1908.
In square 1912, the state machine 1706 is not carried out the renewal for possessing position 1702, is also not carried out owner position 1704 Renewal, flow ends in square 1912.
In decision block 1914, state machine 1706 checks the owner position 1704, to determine whether core x is hardware signal The owner of amount 118.If so, then flow proceeds to decision block 1916;Otherwise, flow proceeds to square 1912.
In decision block 1916, state machine 1706 checks write value.If the value is 1, it represents that the core 102 is intended to Obtain hardware semaphore 118 ownership, then flow proceed to square 1912 (wherein therefore core 102 possessed hardware semaphore 118, so not having more kainogenesis, as judged in decision block 1914).If however, the value is 0, it represents that the core 102 is intended to put The ownership of hardware semaphore 118 is abandoned, then flow proceeds to square 1918.
In square 1918, it is zero that the state machine 1706 renewal, which possesses position 1702, to represent not having core 102 to possess firmly now Part semaphore 118, flow end at square 1918.
As described above, in one embodiment, microprocessor 100 includes 16 hardware semaphores 118.When a core 102 writes During the presumptive address, it writes one 16 bit data values, and one of them is different hard for its each corresponding 16 hardware semaphores 118 Part semaphore 118, and indicate whether the core 102 of the write-in presumptive address asks to possess (value 1) or abandon corresponding hardware signal The ownership (value zero) of amount 118.
In one embodiment, arbitrated logic arbitration asked to access the hardware semaphore 118 by core 102 so that core 102 by Hardware semaphore 118 serializes (Serialize) read/write hardware semaphore 118.In one embodiment, arbitrated logic exists Using a loop control justice algorithm (Round-Robin Fairness Algorithm) with access hardware signal between core 102 Amount 118.
Figure 20 is referred to, it is display when microprocessor 100 needs a resource to monopolize institute using hardware semaphore 118 to perform The operational flowchart having the right.Further illustrate, hardware semaphore 118 is write to be run into respectively in two or more core 102 Return and make to ensure that performing one in a sometime only core 102 writes back in the case of the shared failure of cache memory 119 instruction, And shared cache memory 119 is set to fail.The operation is but the microprocessor 100 with described by the angle of a single core Each core 102 is ensured that a core 102 performs and is write back and make the operation of other cores 102 invalid according to the present invention with overall.That is, Figure 20 operation ensures that WBINVD instruction process is serialized (Serialize).In one embodiment, Figure 20 operation can be one Performed in microprocessor 100, its embodiment in Fig. 7 A~7B performs WBINVD instructions.Flow starts from square 2002。
In square 2002, a core 102 runs into a speed buffering control instruction, seems WBINVD instructions.Flow is carried out To square 2004.
In square 2004, the write-in 1 of core 102 is into WBINVD hardware semaphores 118.In one embodiment, the microcode has been One of hardware semaphore 118 is distributed into WBINVD operations.The core 102 then read WBINVD hardware semaphores 118 with Determine whether it obtains ownership.Flow proceeds to decision block 2006.
In decision block 2006, if core 102 determines the ownership that it obtains WBINVD hardware semaphores 118, flow Journey proceeds to square 2008;Otherwise, flow is back to square 2004 to again attempt to obtain ownership.It should be noted that when instant The microcode of core 102 circulates between square 2004 to 2006, and it is eventually by possessing the core 102 of WBINVD hardware semaphores 118 Interrupted, because the core 102 performs WBINVD just in Fig. 7 A~7B and instructs and transmit an interruption to instant core in square 702 102.More preferably say, via each circulation, the microcode of instant core 102 checks interrupt status buffer, with observe other cores 102 its One of (for example, possessing the core 102 of the WBINVD hardware semaphores 118) whether send an interruption to instant core 102.This is instant Core 102 then will perform Fig. 7 A~7B operation, and in square 749 according to Figure 20 recovery operations with attempt obtain hardware signal The ownership of amount 118, to perform its WBINVD instructions.
In square 2008, core 102 has obtained square 702 that all flows for the time being proceed in Fig. 7 A~7B to perform WBINVD is instructed.Due to the WBINVD command operatings of part, in Fig. 7 A~7B squares 748, the core 102 write-in zero to WBINVD To abandon its ownership in hardware semaphore 118.Flow ends at square 2008.
One operation for being similar to described by Figure 20 can be performed by the microcode, be monopolized with other shared resources of acquisition all Power.It is non-core 103 that one core 102, which can be obtained by using other resources of exclusive ownership used in a hardware semaphore 118, Buffer, it is shared by core 102.In one embodiment, the buffer of non-core 103 includes a control buffer, and it is included often One core, 102 respective field.The field controls the operating aspect of each core 102.Because field is located in identical buffer, when When one core 102 is intended to update its respective field but can not update the field of other cores 102, it is temporary that the core 102 must read the control Storage, the read value of modification, the value changed then is write back to controlling buffer.For example, microprocessor 100 can wrap The Properties Control buffer of a non-core 103 (Performance Control Register, PCR) is included, it is used to control core 102 Bus clock pulse ratio.In order to update its bus clock pulse ratio, a specific core 102 must read, change and write back PCR.Therefore, one In embodiment, when microcode is configured as core 102 and possesses the hardware semaphore 118 related to PCR, effective original of a PCR is performed Sub- reading/modification/writes back.Bus clock pulse ratio determines that the single clock frequency of core 102 is the support microprocessor via an external bus The multiple of the clock frequency of device 100.
Another resource is a reliable platform module (Trusted Platform Module, TPM).In one embodiment, Microprocessor 100 performs a reliable platform module of running microcode in core 102.In the given instant time, operation In a core 102 and core 102, the microcode of one of them implements TPM.However, implementing TPM core 102 may change over time.It is logical Cross and can ensure that an only core 102 implements TPM in the time using the hardware semaphore 118 associated with TPM, the microcode of core 102.More Specifically describe, the positive core 102 for performing TPM writes TPM states to special arbitrary access before abandoning implementing the TPM and deposited at present Reservoir 116, and adapter implementation TPM core 102 reads TPM state from special random access memory 116.Each The microcode of core 102 is configured as making when core 102 is intended to turn into the core 102 for performing TPM, and core 102 is by special random access memory The ownership of TPM hardware semaphores 118 is obtained before TPM states are read in device 116 first, and starts to perform TPM.Implement one In example, TPM generally conforms to the TPM specification issued by believable computing tissue (Trusted Computing Group), seems ISO/IEC11889 specifications.
As described above, tradition is utilized in system storage to the solution method of resource contention between multiple processors Software signal amount (software semaphore).The potential advantage of hardware semaphore 118 described herein is that it can be avoided The generation of additional transmissions amount in extra memory bus, and its access speed is faster than the memory of access system.
Interrupt, non-sleep synchronization request
Figure 21 is referred to, it is to show that sending non-sleep synchronization request according to the core 102 of Fig. 3 flow charts operates an example Timing diagram.In this example, the configuration of microprocessor 100 has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.So And, it should thus be appreciated that, in other embodiments, the microprocessor 100 may include the core 102 of varying number.
Core 0 writes a SYNC 14, and it is not set in position 212 of sleeping, nor is arranged at the (example of selective wake-up position 214 Such as, a non-sleep synchronization request) in.Therefore, control unit 104 allows core 0 to remain on the (branch of each decision block 312 "No").
Core 1 finally also writes a non-sleep SYNC 14 and control unit 104 allows core 1 to remain on.Finally, core 2 writes One non-sleep SYNC 14.As illustrated, each core write-in SYNC 14 time may be different.
When all cores have been written into non-sleep synchronization 14, control unit 104 simultaneously send a sync break to each core 0, Core 1 and core 2.Each core then receives sync break and service synchronization is interrupted (unless the sync break is shielded, in such case Under, the microcode typically understands poll (poll) sync break).
Pilot processor is specified
In one embodiment, as described above, usual (for example, when Figure 23 " when all core BSP " function is deactivated) core 102 specify this as bootstrap processor (BSP) and perform specifying for task, seem guiding work system.In one embodiment, lead to Often (for example, when Figure 22 and 23 " modification BSP " and " when all core BSP " function is deactivated respectively) quantity of virtual core is by core 102BSP is preset as 0.
However, inventor have observed that BSP is designated with being probably favourable in a manner of one is different, embodiment will It is described below.For example, many tests of part microprocessor 100, particularly in manufacture is tested, are operated by guiding System and operation program code perform, to ensure that the part microprocessor 100 is normally carried out work.Because of the execution system of BSP cores 102 Initialize and start the operating system, therefore BSP cores 102 can be run in a manner of AP cores can not be run.In addition, can by observation Know, even in multi-threading (Multithreaded) operating environment, it is larger that BSP generally bears the processing load compared with AP Part, therefore, AP cores 102 can not make comprehensive test as BSP cores 102.Finally, may having some actions, it only need to be by It to perform, seems that encapsulation sleep state as described in Fig. 9 is shaken hands association that the BSP cores 102, which represent microprocessor 100 and are integral, View.
Therefore, embodiment, which describes any core 102, can be designated as BSP.In one embodiment, in the survey of microprocessor 100 During examination, testing results n times, wherein N are the quantity of the core 102 of microprocessor 100, and micro- place in each operation of test Reason device 100 is reconfigured so that BSP is different core 102.This can advantageously provide more preferable test in the fabrication process Coverage rate, and also advantageously in the design process of microprocessor 100 disclosed in the mistake in microprocessor 100.It is another excellent Point is that each core 102 can have a different APIC ID in different operations, and so as to respond different interrupt requests, it can Wider test coverage is provided.
Figure 22 is referred to, it is the program flow diagram for showing configuration microprocessor 100.Figure 22 description with reference to figure 4 In polycrystal microprocessor 100, it includes two crystal 406 and eight cores 102.However, it should be appreciated that it is described in this Dynamic reconfigure can be used there is a different configuration of microprocessor 100, i.e., with more than two crystal or single crystal, And more or less than eight cores 102 but at least two cores 102.This operation is described by angle from a single core, but microprocessor 100 each core 102 with overall dynamics operates according to the description and reconfigures the microprocessor 100.The flow side of starting from Block 2202.
In square 2202, microprocessor 100 is reset, and performs the initial part of its initialization, more preferably a mode It is similar to the mode described by above figure 14.However, the generation of configuration correlation, seems the square 1424 in Figure 14, especially It is APIC ID and the BSP flags, is performed in a manner of described in square 2203 to 2204.Flow proceeds to square 2203.
In square 2203, core 102 produces its virtual nuclear volume, is more preferably described in Figure 14.Flow proceeds to judgement Square 2204.
In decision block 2204, the sampling of core 102 one is indicated to determine whether a function can enable.The function is herein Referred to as " modification BSP " functions.In one embodiment, BSP function can be changed by blowing a fuse 114.More preferably say, testing During, the fuse 114 for changing BSP functions is not blown, but a true value (True) is scanned up to and melted with modification BSP functions In the related preservation buffer position of disconnected device 114, as shown in above-mentioned Fig. 1, so that modification BSP functions can enable.In this mode In, modification BSP functions in part microprocessor 100 and it is impermanent enable, but power supply (power-up) disable afterwards. More preferably say, the operation in square 2203 to 2214 is as performed by the microcode of core 102.If modification BSP functions are activated, Flow proceeds to square 2205.Otherwise, flow proceeds to square 2206.
In square 2205, core 102 changes the caused virtual nuclear volume in square 2203.In one embodiment, core 102 change virtual nuclear volumes to produce a cyclical function (Rotate of the produced virtual nuclear volume in square 2203 Function result and an internal circulating load), it is as follows:
Virtual nuclear volume=circulation (internal circulating load, virtual nuclear volume).
Cyclical function, in one embodiment, virtual check figure is circulated between core 102 by period.Internal circulating load is burning One value of disconnected fuse 114, or more preferably say, it is scanned up in test process keeps in buffer.Table 1 shows each core 102 virtual check figure, its ordered pair (amount of crystals 258, local nuclear volume 256) are shown in the left row of an example configuration, And each internal circulating load is shown in the row of top, its amount of crystals 406 is two and the quantity of core 102 of each crystal 406 is 4, and All cores 102 can be activated.In such a mode, tester, which is authorized to, makes core 102 produce its virtual check figure and for example any have The APIC ID of valid value.Although in the embodiment being described in for changing virtual check figure, other embodiments can be also expected. For example, loop direction can be conversely shown in form 1.Flow proceeds to square 2206.
Table 1
In square 2206, core 102 caused in square 2203 will be preset virtual nuclear volume or produced in square 2203 The raw value changed is inserted in local APIC ID buffers.In one embodiment, APIC ID buffers can be existed by the core 102 In storage address 0x0FEE00020 (for example, by by BIOS and/or operating system) is read from itself.However, In another embodiment, APIC ID buffers can be read by core 102 in MSR addresses 0x802.Flow proceeds to decision block 2208。
In decision block 2208, core 102 determines whether it in the APIC ID that square 2208 is inserted is zero.If so, then Flow proceeds to square 2212;Otherwise, flow proceeds to square 2214.
In square 2212, its BSP flag is arranged to true (true) by core 102, to represent core 102 for BSP.Implement one In example, BSP flags are one of the x86APIC plots buffer (IA32_APIC_BASE MSR) of the core 102.Flow proceeds to Decision block 2216.
In square 2214, BSP flags are set to false as (false) by core 102, to represent core 102 not for BSP, for example, In one AP.Flow proceeds to decision block 2216.
In decision block 2216, core 102 judges whether it is BSP, such as, if this is specified as in square 2212 BSP cores 102, and non-designated itself is AP cores 102 in square 2214.If so, then flow proceeds to square 2218;It is no Then, flow proceeds to square 2222.
In square 2218, core 102 starts extraction and execution system initialization firmware is (for example, BSP BIOS bootstrap Code).This may include the instruction related to BSP flags and APIC ID, for example, reading APIC ID buffers or APIC plots are kept in The instruction of device, in the case, core 102 recover the value write in square 2206 and 2212/2214.It may also include as micro- place It to perform operation, seems encapsulation sleep state that Fig. 9 is described that the reason unique core 102 of device 100, which represents microprocessor 100 and is integral, Handshake Protocol.More preferably say, BSP cores 102 start to obtain in a defined framework resets vector and execution system initialization is solid Part.For example, in x86 frameworks, reset vector and point to 0xFFFFFFF0.More preferably say, execution system initialization firmware includes drawing The operating system is led, for example, being loaded into the operating system and being changed into control operation system.Flow proceeds to square 2224.
In square 2222, core 102 stops itself and waits the initiating sequence from BSP to refer to start to extract and perform Order.In one embodiment, the initiating sequence received from BSP include to AP system initialization firmwares an interrupt vector (for example, AP bios programs code).This may include the instruction related to BSP flags and APIC ID, and in this case, core 102 recovers The value write in square 2206 and 2212/2214.Flow proceeds to square 2224.
In square 2224, when 102 execute instruction of core, the core 102 is temporary based on its APIC ID is write in square 2206 The APIC ID of storage receive interrupt requests and respond the interrupt requests.Flow ends at square 2224.
As described above, according in an embodiment, the core 102 that virtual check figure is zero is preset as BSP.However, inventor is It was observed that may have a case that to be designated as all cores 102 BSP favourable, embodiment will be described in lower section.For example, The developer of microprocessor 100 has put into significantly substantial amounts of time and into original research and development one and has been designed in single-threaded (single-threaded) the huge test subject run in a monokaryon, and developer wants to test to survey using monokaryon Try multi-core microprocessor 100.For example, the test may be old and well-known in x86 realistic models dos operating system in run.
In the operation of each core 102, these tests can use the modification BSP functions described in Figure 22 with continuous one Mode in complete and/or by blow fuse or scanning to keep buffer change fuse value to disable all cores 102, But a core 102 is used for being tested.However, inventor have understood that this will than in all cores 102 simultaneously testing results needs More times (for example, being about 4 times in the case of one 4 core microprocessors 100), in addition, required test is each individually micro- The time of the part of processor 100 is valuable, especially when manufacturing hundreds thousand of or more parts of microprocessor 100, is particularly When many tests are tested in very expensive test equipment.
In addition, other may be when running more than one core 102 (or all cores 102) in the same time, due to it More heat energy can be produced and/or attract more energy, the speed path in the logic of microprocessor 100 will be applied in more The situation of more pressure.The test run in this continuous mode may not produce extra pressure and disclose the speed road Footpath.
Therefore, embodiment describes all cores 102 can be specified the BSP cores 102 so that all cores 102 can perform simultaneously by dynamic One test.
Figure 23 is referred to, it is to show the program flow diagram according to microprocessor 100 is configured in another embodiment.Scheming With reference to the polycrystal microprocessor 100 in figure 4, it includes two crystal 406 and eight cores 102 for 23 description.However, Ying Keli Solution, dynamic described herein, which reconfigures can be used, has a different configuration of microprocessor 100, i.e., with more than two Crystal or single crystal, and more or less than eight cores 102 but at least two cores 102.This operation is the angle institute from a single core Description, but each core 102 of microprocessor 100 with overall dynamics operates according to the description and reconfigures the microprocessor 100.Flow starts from square 2302.
In square 2302, microprocessor 100 is reset, and performs the initial part of its initialization, more preferably a mode It is similar to the mode described by above figure 14.However, the generation of configuration correlation, seems the square 1424 in Figure 14, especially It is APIC ID and the BSP flags, is performed in a manner of described in square 2304 to 2312.Flow proceeds to decision block 2304。
In decision block 2304, core 102 is detected a function and can be activated.The function is referred to herein as " all cores BSP " functions.More preferably say, blowing fuse 114 can be activated all core BSP functions.More preferably say, in test process, The fuse 114 of all core BSP functions is not blown, but a true value (True) is scanned up to and fused with all core BSP functions In the related preservation buffer position of device 114, as shown in above-mentioned Fig. 1, so that all core BSP functions can enable.In this mode In, all core BSP functions in part microprocessor 100 and it is impermanent enable, but stop powering after (power-up) With.More preferably say, the operation in square 2304 to 2312 is as performed by the microcode of core 102.If all core BSP functions are opened Used time, flow proceed to square 2305.Otherwise, flow proceeds to square 2203 in Figure 22.
In square 2305, no matter the quantity of crystal 258 of local nuclear volume 256 and core 102 why, it is empty that core 102 sets its Nucleoid quantity is zero.Flow proceeds to square 2306.
In square 2306, the virtual nuclear volume that set value is zero in square 2305 is inserted local APIC by core 102 ID buffers.Flow proceeds to square 2312.
In square 2312, no matter the quantity of crystal 258 of local nuclear volume 256 and core 102 why, core 102 set its BSP Flag is true (True) to represent the core 102 for BSP.Flow is carried out to square 2315.
In square 2315, when a core 102 performs a memory access requests, microprocessor 100 is respectively modified often The higher address position of the memory access requests address of one core 102 so that each core 102 accesses its single storage space. That is according to the core 102 for producing memory access requests, microprocessor 100 changes higher address position, so that higher address position With the unique value of each core 102 1.In one embodiment, microprocessor 100 is changed indicated by the value as blowing fuse 114 Higher address position.In another embodiment, amount of crystals 258 of the microprocessor 100 based on local nuclear volume 256 and core 102 Change higher address position.For example, nuclear volume is that microprocessor 100 is changed in 4 embodiment in a microprocessor 100 Two higher positions of the storage address, and a unique value is produced in two higher positions of each core 102.In fact, can The storage space addressed by microprocessor 100 is divided into N number of subspace, and wherein N is the quantity of core 102.Test program is opened Hair causes it to limit oneself in itself to specify the address of the minimum subspace in N number of subspace.For example, it is assumed that microprocessor 100 The address and microprocessor 100 that memory 64GB can be looked for include four cores 102.The test, which is developed, only accesses memory most Low 8GB.When core 0 performs access storage address A (relatively low 8GB in memory) instruction, microprocessor 100 is being deposited An address is produced in memory bus A (unmodified);When core 1 performs access the same memory address A instruction, the microprocessor 100 produce an address in memory bus A+8GB;When core 2 performs access the same memory address A instruction, micro- place Reason device 100 produces an address in memory bus A+16GB;And when core 3 performs access the same memory address A instruction When, the microprocessor 100 produces an address in memory bus A+32GB.In such a mode, advantageously, core 102 will not Can mutually it conflict in it accesses memory, it can be appropriately carried out test.More preferably say, single-threaded test is performed in In one independent test machine, it can individually test the microprocessor 100.The microprocessor 100 developer develops test number The microprocessor 100 is supplied to according to and by test machine, on the contrary, the microprocessor 100 developer researches and develops result data, its For by the test machine data result that the interior comparison microprocessor 100 is write during a memory write accesses, to ensure The microprocessor 100 writes correct data.In one embodiment, cache memory 119 is shared (for example, highest high Fast buffer storage, its produce for external bus processing in address) be microprocessor 100 a part, its configure to Higher address position is changed when all core BSP functions enable.Flow proceeds to square 2318.
In square 2318, core 102 starts extraction and execution system initialization firmware is (for example, BSP BIOS bootstrap Code).This may include the instruction related to the BSP flags and APIC ID, for example, reading the APIC ID buffers or APIC plots The instruction of buffer, in the case, the core 102 recover the null value write in square 2306.More preferably say, the BSP cores 102 start to read and hold in the replacement vectorial (Architecturally-defined reset vector) that a framework defines Row system initialization firmware.For example, in x86 frameworks, reset vector and point to 0xFFFFFFF0 addresses.More preferably say, performing should System initialization firmware includes guiding operating system, for example, being loaded into the operating system and being changed to control the operating system.Stream Journey proceeds to square 2324.
In square 2324, when 102 execute instruction of core, the core 102 is temporary based on its APIC ID is write in square 2306 The APIC ID values that storage value is zero receive interrupt requests and respond the interrupt requests.Flow ends at square 2324.
Although all cores 102 are designated as being described in Figure 23 in the embodiment of the BSP, other embodiments can To consider multiple but be designated as the BSP all or fewer than core 102.
Although embodiment is described with an x86 type system for content, each core 102 uses a local APIC and tool in its system There is the relevance between local APIC ID and BSP are specified, it should thus be appreciated that, the specified not office of the bootstrap processor It is limited to x86 embodiment, but can be used in the system with different system framework.
Propagation for the microcode patching (PATCH) of multinuclear
As observed by previously, it is possible to many important functions of mainly being performed by the microcode of microprocessor, and especially, It correctly need to communicate and coordinate between the microcode example being implemented in the microprocessor multinuclear.Due to the complexity of microcode, Therefore in a significant wrong microcode that will be present in correcting of probability display.This can be caused via using the substitution of new micro-code instruction The microcode patching of the old micro-code instruction of the mistake is completed.That is, the microprocessor is included beneficial to the specific of microcode patching Hardware.In general, ideal is that micro- modification is applied to all cores of the microprocessor.Traditionally, its by Framework instruction is individually performed in each core to perform repairing.However, traditional method might have problem.
First, the repairing it is related to the intercore communication using microcode example (for example, core synchronous, hardware semaphore use) or With needing the function of microcode intercore communication (for example, across core adjust request, speed buffering control operation or power management, or dynamic are more Core microprocessor configures) it is related.The execution of framework repairing application program may produce a time form on each core respectively, Its microcode patching be applied in some cores but not be applied in other cores (or a previous repairing application some cores and newly Repairing application to other cores).This is likely to result in the incorrect operation of an internuclear communication failure and the microprocessor.If should All cores of microprocessor use identical microcode patching, and other expectable and not expected problem may also produce.
Secondly, the framework of the microprocessor specifies many functions, and it can be micro- by this in some examples (instance) Reason device is supported, and is not supported by other microprocessors.During operation, microprocessor can be with supporting the specific function System software is communicated.For example, in the case of an x86 architectural framework microprocessors, x86CPUID instructions can be soft by system Part is performed to determine supported function setting.However, determine the instruction (for example, CPUID) of function setting respectively at micro- place Manage and performed in each core of device.In some cases, a function can be deactivated because of a mistake being present in the time, and be solved Except the microprocessor.However, it can be developed with latter this wrong microcode patching of reparation, so that this function can be in repairing application After be activated.However, if repairing is (for example, by applying repairing instruction in each core with traditional conventional implementation Do not instruct, be implemented on each core respectively), different core may depend on whether the repairing is had been applied in core, be given one Time point indicates different functional configuration.This be probably it is problematic, especially when the system software (such as operating system, for example, Internuclear Thread is helped to migrate), it is expected all cores of the microprocessor has identical function setting.Especially, it has been observed that Some system softwares only obtain the functional configuration of a core, and assume that other cores have identical functional configuration.
Furthermore each nuclear control and/or to non-nuclear resource that core is shared (for example, synchronous related hardware, hardware signal Amount, shared PRAM, shared high-speed buffer or service unit) communication microcode example.Therefore, because in core wherein it One has no use (or two cores are with different microcode patchings), in general, two kinds with other cores using microcode patching It is probably problematic that the microcode of different IPs is controlled simultaneously or carries out communication with non-nuclear resource in two different ways.
Finally, can also be using the repairing of traditional approach in the microcode patching hardware of the microprocessor, but it may make Into other core repairing applications and the interference by a core repair operation, if for example, the part of repairing hardware is internuclear shared.
More preferably say, microcode patching is applied in a manner of an atom (atomic) to a multi-core microprocessor in framework instruction-level Embodiment with solve the problems, such as description in this article.First, by repairing application in overall microprocessor 100 in response to list The execution that a framework instructs in one core 102.That is, it is micro- need not to require that system software performs an application in each core 102 for embodiment Code repairing instruction is (as described below).More specifically, information will be transmitted using the single core 102 of microcode patching instruction by running into this And other cores 102 are interrupted to cause its microcode to be used for the example of repair part, and all microcode examples make with another microcode cooperation Obtain the microcode patching to apply into the microcode patching software of each core 102, and when disabling interruption in all cores 102, share The repairing hardware of the microprocessor 100.Secondly, run in all cores 102 and realize the microcode of the atom repairing application mechanism Example is mutually cooperated with another microcode, so that it avoids performing any framework and instructing existing (in addition to an application microcode patching instruction) All cores 102 of the microprocessor 100 are had agreed to after being repaired using this, untill all cores 102 are completed.That is, work as When any core 102 is using the microcode patching, framework instruction is performed without core 102.In addition, in one more preferably embodiment, institute There is core 102 to reach the microcode identical place and disable the repairing application interrupted to perform to have, and use is only performed in core 102 afterwards In repairing the micro-code instruction untill all cores of the microprocessor 100 confirm that the repairing has been used.That is, work as When any core 102 of the microprocessor 100 is just using the repairing, core 102 is not had in addition to using the micro-code instruction of microcode patching Core 102 performs micro-code instruction.
Figure 24 is refer to, it is the block diagram for showing the multi-core microprocessor 100 according to another embodiment.The microprocessor Device 100 is in many aspects similar in appearance to Fig. 1 microprocessor 100.However, Figure 24 microprocessor 100 is additionally included in its non-core 103 In service unit (Service Processing Unit, SPU) 2423, service unit (SPU) initial address it is temporary Storage 2497, a non-core microcode read-only storage (Read Only Memory, ROM) 2425 and a non-core microcode patching are deposited at random Access to memory (Random Access Memory, RAM) 2408.In addition, each core 102 includes a core PRAM2499, a repairing Can addressing content memorizer (Content Addressable Memory, CAM) 2439 and a core microcode ROM 2404.
Microcode includes micro-code instruction.The micro-code instruction for be stored in the microprocessor 100 one or more memories (for example, Non-core microcode ROM 2425, non-core microcode patching RAM2408 and/or core microcode ROM 2404) in nand architecture instruction, wherein should Micro-code instruction is based on being stored in the nand architecture microprogram counter (Micro-program Counter, Micro- by a core 102 PC extraction (fetch) address is extracted in), and is used by the core 102 to realize the instruction set architecture of microprocessor 100 Instruction.More preferably say, the micro-code instruction is translated into microcommand by a micro- transfer interpreter (Microtranslator), its microcommand by Performed by the execution unit of the core 102, or in another embodiment, the micro-code instruction is directly as performed by execution unit, herein In the case of, micro-code instruction is microcommand.The micro-code instruction is that nand architecture instruction means that it is not the instruction set of the microprocessor 100 The instruction of framework (Instruction Set Architecture, ISA), but its according to one be different from the framework instruction set finger Order collects and is encoded.The nand architecture microprogram counter is not defined by the instruction set architecture of the microprocessor 100, and different (Architecturally-defined) program counter is defined in the framework of the core 102.This is micro- as follows to realize for the microcode The some or all of instructions of the ISA instruction set of processor.ISA instructions are performed in response to one microcode of decoding, the core 102 is changed into Control a microcode routine program (Routine) related to the ISA.The microcode routine program includes micro-code instruction.The execution Unit performs the micro-code instruction, or according to preferred embodiment, the micro-code instruction is further translated for by the execution unit institute The microcommand of execution.The micro-code instruction (or the microcommand translated as the micro-code instruction) is as the execution performed by the execution unit As a result it is the result as defined in ISA instructions.Therefore, the microcode related to ISA instructions (or refers to from the microcode routine program Making the microcommand of translation) the common execution of routine is " to implement (Implement) " ISA instructions by the execution unit. That is by performing the common execution performed by the execution unit of micro-code instruction (or the microcommand translated from the micro-code instruction) The operation by the ISA instructions in the input of the ISA instructions is completed, institute is instructed by the ISA to produce one The result of definition.In addition, when the microprocessor resets to configure the microprocessor, the micro-code instruction can be performed (or translating to the microcommand being performed).
The core microcode ROM 2404 possesses as the microcode performed by the particular core 102 including the core microcode ROM 2404.This is non- Core microcode ROM 2425 also possesses as the microcode performed by the core 102.However, compared with core microcode ROM 2404, non-core ROM 2425 are shared by core 102.More preferably say, because non-core ROM 2425 access time is more than core ROM 2404, therefore Non-core ROM 2425 possesses the microcode routine program for needing less performance and/or less frequently performing.In addition, non-core ROM 2425 possess the procedure code for being extracted and being performed by the SPU 2423.
Non-core microcode patching RAM2408 is also shared by core 102.Non-core microcode patching RAM2408 possesses by core 102 Performed micro-code instruction.When the extraction address and the content phase of one of project (entry) in repairing CAM 2439 Timing, then repairing CAM2439, which possesses, extracts address by one microcode of response and is exported by repairing CAM 2439 to a micro- sequence The patch address of row device (Microsequencer).In the case, the patch address of microsequencer output is the microcode Address is extracted, rather than the extraction of next order refers to address (or destination address in the case of branching type instruction), using non-as this The reply of the repairing micro-code instruction of core repairing RAM 2408 outputs one.For example, because repairing micro-code instruction and/or after which micro- Code instruction is an error source, therefore a repairing micro-code instruction is carried out by being extracted in repairing RAM 2408 in non-core, rather than from this The micro-code instruction extracted in non-core ROM 2425 or core ROM 2404.Therefore, the repairing micro-code instruction effectively replace or Repairing resides in core ROM 2404 or the 2425 unexpected microcode of non-core microcode ROM in the original microcode extraction address Instruction.More preferably say, repairing CAM 2439 and repairing RAM 2408 are loaded into respond the framework being included in system software Instruction, seems BIOS or the operating system run in the microprocessor 100.
In other events, non-core PRAM 116 is by the microcode storing value used in the microcode.These values A part of valid function is constant
Except the execution for the instruction (for example, a WRMSR instruction) that the value may be clearly changed via a repairing or for response one Outside, when the microprocessor 100 be reset and during the operation of the microprocessor 100 in be not modified when, due to its for storage It is stored in the immediate value (immediate value) of the core microcode ROM 2404 or the non-core microcode ROM 2425 or in the microprocessor The time point that device 100 is manufactured or write by the microcode to non-core PRAM 116 blows the fuse 114.Advantageously, this A little values can be changed via repairing mechanism described herein, without changing the possible very expensive core microcode of cost ROM2404 or the non-core microcode ROM 2425, and without the fuse 114 that one or more do not blow.
In addition, non-core PRAM 116 is to preserve the repairing code for being extracted and being performed by the SPU 2423, such as this paper institutes State.
Core PRAM 2499, it is similar to non-core PRAM 116, is special (private), or nand architecture, its Mean that core PRAM 2499 is not in the framework user's program address space of microprocessor 100.However, unlike this is non- Core PRAM 116, every PRAM 2499 are only read by its respective core 102 and not shared by other cores 102.As the non-core As PRAM 116, core PRAM 2499 is also as the microcode using to store the value as used in the microcode.Advantageously, this A little values can be changed via repairing mechanism described herein, and need not change the core microcode ROM 2404 or non-core microcode ROM 2425。
The SPU 2423 has stored program processor including one, and it is an adjunct attached and different from each core 102 (adjunct).Although can perform the ISA of the core 102 instruction (for example, x86 ISA instructions) in the structure of core 102, But the SPU 2423 can not so be done in structure.So that it takes up a position, for example, the operating system can not transport in the SPU 2423 OK, the ISA operation system scheduler (for example, x86 ISA instructions) of the core 102 can not be also made to be transported in the SPU 2423 OK.In other words, the SPU 2423 not system resources to be managed by the operating system.More precisely, the SPU 2423 Perform the operation for adjusting the microprocessor 100.In addition, the SPU 2423 can help to measure the performance of the core 102 and its Its function.More preferably say, the SPU 2423 is smaller than the core 102, less complex and have less power consumption (example Such as, in one embodiment, the SPU 2423 includes built-in clock pulse gate (Clock Gating)).In one embodiment, SPU 2423 include a FORTH CPU cores.
It is able to can not handle very with asynchronous events possibly occurred as removing performed by the core 102 together with wrong instruction It is good.However, it is advantageous that the SPU 2423 can be ordered to detect the event by a core 102, and operation is performed, seem to establish One record shelves (log) change behavior and/or the external bus interface of microprocessor 100 of each side of core 102, using as detecing Survey the response of this event.The SPU 2423 can provide the record shelves information to the user, and it can also be mutual with tracker It is dynamic that to ask, the tracker provides the record shelves information or request tire tracker performs other actions.In one embodiment, the SPU 2423 are able to access that the programmable interrupt controller of the buffer and each core 102 that control the memory sub-system, and this is total to Enjoy the control buffer of speed buffering buffer 119.
The SPU 2423 can detect the example of event including as follows:(1) one core 102 just operates, for example, the core 102 is one Not yet resignation (retire) programmable any instruction in the clock cycle of quantity;(2) one cores 102 are loaded into non-by memory one Data in speed buffering region;(3) temperature changes in the microprocessor 100;(4) operating system request is micro- at this The bus clock pulse of processor 100 than one change and/or ask the voltage level of microprocessor 100 a change;(5) this is met The microprocessor 100 of body changes voltage level and/or bus clock pulse ratio, for example, to reach power saving and improve performance;(6) one One internal timer overtime of core 102;(7) one speed bufferings spy upon (snoop), and it collides an amended scratchpad row (Cache line), and cause the scratchpad row to be written back in memory;(8) temperature of the microprocessor 100, voltage, Bus clock pulse ratio exceeds a respective scope;An external terminal (pin) of (9) one outer triggering signals in the microprocessor 100 In established by a user.
Advantageously, because of the procedure code 132 of core 102 described in the independent operatings of SPU 2423, it seems in the core that it, which does not have, The limitation of tracker microcode (tracer code) identical is performed in 102.Therefore, the SPU 2423 can detect or be notified independence The event of exercise boundary is instructed in the core 102 and does not interrupt the state of the core 102.
The SPU 2423 has its execution procedure code of itself.The SPU 2423 can from non-core microcode ROM 2425 or from Its procedure code is extracted in non-core PRAM 116.That is, more preferably say, the SPU 2423 and non-core ROM 2425 and the non-core The shared microcodes run in the core 102 of PRAM 116.The SPU 2423, to store its data, is wrapped using non-core PRAM 116 Include the record shelves.In one embodiment, the SPU 2423 also includes the sequence port interface of itself, and it can transmit the record shelves To an external device (ED).Advantageously, the SPU 2423 can also indicate that the tracker run in a core 102 so that the record shelves to be believed Breath is stored into system storage by non-core PRAM 116.
The SPU 2423 is communicated by state buffer and control buffer with the core 102.The SPU state buffer bags Include and corresponding be described in top and the SPU 2423 can detect one of each event., should in order to notify the events of SPU 2,423 1 Core 102 in the SPU state buffers of event to that should set one.Some events position by the microprocessor 100 hardware institute Set and some are as set by the microcode of the core 102.The SPU 2423 reads the state buffer to determine what is occurred The list of event.One control buffer includes the position of corresponding each operation, and its each operation exists for the SPU 2423 response detectings An operation of one of event is specified in state buffer.That is, in each possible thing of the state buffer Part, one group of operative position are present in the control buffer.In one embodiment, each event has 16 act bits.Implement one In example, when the state buffer is written into indicate an event, it can cause the SPU 2423 to interrupt, to be used as the SPU 2423 read the response of the state buffer, to determine which event has occurred and that.Advantageously, can be so somebody's turn to do by reducing The demand of the polls of the SPU 2423 state buffer is to save power supply.The state buffer and control buffer can also be referred to by execution User's program of (for example, RDMSR and WRMSR instruction) is made to read and write.
The executable group operations as one event response of detecting of the SPU 2423 include the following.(1) this is noted down Shelves information writes non-core PRAM 116.Operation for each write-in record shelves, multiple operative positions are present so that program is set Meter personnel specify the subset of the only specific record shelves information to be written into.(2) by writing the record shelves in non-core PRAM 116 Information is to the sequence port interface.(3) write-in controls one of buffer to set an event of tracker.That is, The SPU 2423 can interrupt a core 102 and cause the tracker microcode to perform one group of operation related to the event.The operation Can be by specified by previous user.In one embodiment, when the SPU 2423 writes the control buffer to set the thing During part, this can cause the hardware check of core 102 1 abnormal, and the hardware check abnormality processing machine check is to check tracker It is no to be activated.If so, then hardware check exception handler conversion and control is to the tracker.The tracker reads the control buffer And if when being arranged on this and controlling the event in buffer to have enabled the event of the tracker for user, the tracker by with The related user of event performs previously described operation.For example, an event can be set to cause the tracking in the SPU 2423 Device is by the record shelves information writing system memory being stored in non-core PRAM 116.(4) control of write-in one buffer, to make It is branched off into the microcode as the microcode address specified by the SPU 2423.If this is that to be particularly helpful to the microcode unlimited one In circulation so that the tracker can not perform any significant operation, but the core 102 still performs and retracts (retire) this refers to Order, it means that the event that the processor is just performing will not occur.(5) control of write-in one buffer is so that a core 102 is reset.Such as Mentioned above, the SPU 2423 can detect the core 102 just carried out and (for example, for some time programmable amounts, not yet move back Return (retire) any instruction) and reset the core.Whether the replacement microcode can be checked to check the replacement by the institutes of SPU 2423 Initiate, if so, during the core 102 is initialized, contribute to before the record shelves information is removed to write out the record shelves information Into system storage.(6) shelves event is continuously recorded.In this mode, and the event of non-camp one is interrupted, but the SPU 2423 one check the state buffer circulation (loop) in rotation (spin), and continuously record information to be shown in this with The related non-core PRAM 116 of event, and may be selected that the record shelves information additionally is write into the sequence port interface.(7) write One control buffer, the shared cache memory 119 is issued a request to stop a core 102, and/or stop the shared height The fast confirmation request of buffer storage 119 is to core 102.This is particularly useful in the design mistake for removing memory sub-system correlation, as It is page translation tables (tablewalk) hardware error, or even interior during the microprocessor 100 operates can changes the mistake, as It is that the procedure codes of SPU 2423 are changed by a repairing, as described below.(8) external bus of microprocessor 100 1 is written to connect The control buffer of mouthful controller, to perform the processing in external system bus, seem the specific cycle or memory read/ Write cycle.(9) write-in is interrupted to another to a control buffer of the programmable interrupt controller of a core 102 for example, producing one The mistake of core 102 or one I/O devices of simulation to core 102 or fixed reparation in the interrupt control unit.(10) this is common for write-in one A control buffer of cache memory 119 is enjoyed to control its size, for example, disabling or enabling by different way correlation Shared cache memory 119.(11) the control buffer for writing the various functions unit of core 102 is special to configure different performances Sign, seems branch prediction (branch prediction) and data preextraction (prefetch) algorithm.As described below, the SPU 2423 procedure codes can help to be repaired, and even in the design for completing the microprocessor 100 and produce the microprocessor 100 Afterwards, the defects of making the SPU 2423 perform action repairing design as described herein, performs other functions.
The SPU initial addresses buffer 2497 keeps, when the SPU 2423 is removed and reset, starting the ground of extraction instruction Location.The SPU initial addresses buffer is write by core 102.The address can be located at non-core PRAM 116 or non-core microcode ROM 2425 In.
Figure 25 is referred to, it is to show the framework block diagram according to the microcode patching 2500 of one embodiment of the invention one.Scheming In 25 embodiment, the microcode patching 2500 includes following part:One header 2502;One repairing 2504 immediately;This is repaired immediately 2504 check and correction and (Checksum) 2506;One CAM data 2508;One core PRAM repairings 2512;The CAM data 2508 and core One check and correction of PRAM repairings 2512 and 2514;One RAM repairings 2516;One non-core PRAM repairings 2518;Core PRAM repairings 2512 And a check and correction of RAM repairings 2516 and 2522.Proofread and 2506/2514/2522 after the microprocessor 100 be loaded on, Make the integrality of the microprocessor 100 verification repairing various pieces.More preferably say, the microcode patching 2500 is by system storage And/or one non-volatile (Non-volatile) system read, for example, seem from a system bios or expansible In the ROM or FLASH memory of firmware.Header 2502 describes each several part of the repairing 2500, seems its size, is repaiied in its loading Mend the position in each self-healing relational storage in part and whether the instruction part comprising one is applied to the microprocessor 100 One effective flag of Efficient software patching.
The instant repairing 2504 includes procedure code (for example, instruction, preferable micro-code instruction) to be loaded on the non-of Figure 24 Core microcode patching RAM 2408 (for example, in Figure 26 A~26B square 2612), then as performed by each core 102 (for example, Figure 26 A~26B square 2616).The repairing 2500 also specifies the instant repairing 2504 to be loaded in repairing RAM2408 Address.More preferably say, this repairs 2504 yards and changes the preset value write by the replacement microcode immediately, seems to be written into influence to be somebody's turn to do The value for the configuration buffer that microprocessor 100 configures.Held in instant repairing 2504 by each core outside repairing RAM2408 After row, it can't be performed again.In addition, follow-up RAM repairings 2516 be loaded into repairing RAM 2408 process (for example, Square 2632 in Figure 26 A~26B) repairing RAM2408 instant repairing 2504 may be covered in.
Repairing in the non-core ROM 2425 that RAM repairings 2516 include being substituted in core ROM2404 or need to repairing is micro- Code instruction.RAM repairings 2516 are additionally included in when the repairing 2500 is by use, the repairing micro-code instruction is written into the repairing The address (for example, in Figure 26 A~26B square 2632) of the position in RAM 2408.The CAM data 2508 are loaded on each The repairing CAM2439 (for example, in Figure 26 A~26B square 2626) of core 102.Above is the behaviour with repairing CAM 2439 Make described by angle, the CAM data 2508 include one or more projects, and each project includes a pair of microcodes and extracts address.This One address is the micro-code instruction being extracted and the content matched by the extraction address.Second address is directed in the repairing Address in RAM 2408, the repairing microcode that there is its repairing RAM 2408 substitution to be repaired micro-code instruction and be performed refer to Order.
Different from the instant repairing 2504, RAM repairings 2516 are maintained in repairing RAM2408, and (with according to repairing The repairing CAM2439 operations of CAM data 2508 are together) continue running to repair the core microcode ROM 2404 and/or the non-core Microcode ROM 2425, untill being reset by another repairing 2500 or the microprocessor 100.
Core PRAM repairings 2512 include being written into the core PRAM2499 of each core 102 data and every in the data One project is written into the address (for example, in Figure 26 A~26B square 2626) in core PRAM2499.Non-core PRAM is repaired 2518 include being written into non-core PRAM 116 data and being written into non-core PRAM 116 in each project of the data Address (for example, in Figure 26 A~26B square 2632).
Figure 26 A~26B are referred to, it is to show that an operation of the microprocessor 100 in Figure 24 is micro- to propagate the one of Figure 25 Code repairing 2500 to multiple cores 102 of the microprocessor 100 a flow chart.The operation is retouched with a single and new angle State, but each core 102 of the microprocessor 100 is operated according to the present invention to propagate the microcode patching jointly to the microprocessor 100 All cores 102.Figure 26 A~26B are described the core for running into the instruction and started using the operation of a modification to the microcode, its flow In square 2602, and the operation of other cores 102, its flow start from square 2652.It should be appreciated that multiple repairings 2500 can Applied in different time in during the microprocessor 100 operates to the microprocessor 100.Such as one first repairing 2500 work as When the system including the microprocessor 100 is directed, seem during BIOS initialization in, according to description atom in this article Embodiment and used, and one second repairing 2500 is used after the operating system, and it is to remove at this It is particularly useful for the purpose of the mistake of reason device 100.
In square 2602, one of core 102 runs into an instruction, and it applies the microcode patching in the microprocessor 100 Instruction.More preferably say, the microcode patching is similar to microcode patching recited above.In one embodiment, this is repaiied using microcode It is x86WRMSR instructions to mend instruction.Instructed to respond this using microcode patching, the core 102 disables interruption and prevents execution from being somebody's turn to do should The microcode instructed with microcode patching.It should be appreciated that it may include one using the system software of microcode patching instruction including this MIMD sequence, using the preparation applied as the microcode patching.More preferably, however say, it is instructed as the sequence single architecture Response, and the microcode patching is transmitted to all cores in the framework instruction-level with an atomic way.That is, in once Break and be deactivated in first core 102 (for example, in square 2602, the core 102 runs into this and instructed using microcode patching), when holding Capable microcode propagate the microcode patching and application to during 100 all cores 102 of microprocessor (for example, until after square 2652 Untill), interruption still maintains to disable;Furthermore once being deactivated (for example, in square 2652) in other cores 102, it is still deactivated (for example, being after square 2634 untill the microcode patching has been applied in all cores 102 of the microprocessor 100 Only).It is therefore advantageous that the microcode patching is transmitted with an atomic way in the framework instruction-level and is applied to the microprocessor In all cores 102 of device 100.Flow proceeds to square 2604.
In square 2604, the core 102 obtains the ownership of the hardware semaphore 118 in Fig. 1.More preferably say, micro- place Managing device 100 includes a hardware semaphore 118 related to repairing microcode.More preferably say, the core 102 obtains hardware letter in such manner The ownership of number amount 118, its mode is similar to described by the Figure 20 of top, more specifically square 2004 and 2006.The hardware is believed Number amount 118 use due to be possible to core 102 one of them using one repairing 2500 using as run into one application microcode patching refer to The response of order, and one second core 102 runs into an application microcode patching and instructed, as second core this will be begun to use second to repair 2500 are mended, it is likely to result in incorrect execution, for example, due to the misuse of first repairing 2500.Flow proceeds to Square 2606.
In square 2606, the core 102 transmits a repair information to other cores 102 and transmits one and internuclear interrupt to other Core 102.More preferably say, the core 102 the time interrupt be deactivated in a period of (for example, the microcode does not allow itself to be interrupted) The microcode is prevented to respond this using microcode patching instruction (square 2602), or responds the interruption (square 2652), and keeps being somebody's turn to do In microcode, untill square 2634.Flow proceeds to square 2608 by square 2606.
In square 2652, one of other cores 102 except running into this in square 2602 using microcode patching (for example, refer to A core outside the core 102 of order) it is interrupted and receives the repairing because of internuclear interrupt transmitted in square 2606 Information.In one embodiment, the core 102 in next framework instruction boundaries (for example, in next x86 instruction boundaries), which obtains, is somebody's turn to do Interrupt.In response to the interruption, the core 102 disables the microcode for interrupting and preventing to handle the repair information.Although as described above, Flow in square 2652 is with described by the angle of a single core 102, but each other cores 102 are not (for example, in square 2602 In core 102) be interrupted in square 2652 and receive the information, and perform square 2608 to square 2634 the step of.Stream Journey proceeds to square 2608 by square 2652.
In square 2608, the synchronization request that the core 102 writes a synchronous situation 21 (is denoted as in Figure 26 A~26B SYNC 21) into its synchronous buffer 108, and by the control unit 104 make the core 102 enter sleep state, and then work as institute When thering is the core 102 to have been written into SYNC 21, waken up by the control unit 104.Flow proceeds to decision block 2611.
In decision block 2611, the core 102 judges whether it is the core 102 for meeting the microcode patching in square 2602 (compared with the core 102 that the repair information is received in square 2652).If so, then flow proceeds to square 2612;Otherwise, Flow proceeds to square 2614.
In square 2612, it is non-that a part for the instant repairing 2504 of the microcode patching 2500 is loaded into this by the core 102 Core repairs RAM 2408.In addition, the core 102 produce the one of loading repairing 2504 immediately check and and verify its with the check and correction and 2506 match.More preferably say, the core 102 also conveys information to other cores 102, its indicate this it is instant repairing 2504 length and The instant repairing 2504 is loaded in the position in non-core repairing RAM2408.Advantageously, because performing reality known to all cores 102 The identical microcode of row microcode patching application, therefore when a previous RAM repairings 2516 are present in non-core repairing RAM2408, Then due to during the period (assuming that be rendered in the microcode patching application microcode be not repaired) in repairing CAM 2439 In will not have collision (hit), therefore using this it is new repairing cover the non-core repairing RAM2408 be safe.In another embodiment In, the instant repairing 2504 is loaded into non-core PRAM 116, and the instant repairing 2504 in square 2616 by the core 102 Before execution, this is repaired 2504 and copies to non-core repairing RAM 2408 from non-core PRAM 116 by core 102 immediately.More preferably Say, the core 102 repairs this part for being loaded into the non-core PRAM 116 for being preserved for this purpose immediately, for example, not It is used for the non-core PRAM 116 of an other purposes part, seems to hold as used in the microcode value (for example, as above institute The state of core 102, TPM states or the effective microcode constant stated), and a part of of non-core PRAM 116 can be repaired (example Such as, in square 2632) so that any previously non-core PRAM repairings 2518 are not destroyed (clobber).In one embodiment, carry Enter non-core PRAM 116 or the action replicated by non-core PRAM 116 performs in multiple stages, retained with reducing this Size needed for part.Flow proceeds to square 2614.
In square 2614, the core 102 writes the same of a synchronous situation 22 (being denoted as SYNC 22 in Figure 26 A~26B) Its synchronous buffer 108 is arrived in step request, and the core 102 is entered sleep state by the control unit 104, then when all cores During 102 one SYNC 22 of write-in, waken up by control unit 104.Flow proceeds to square 2616.
In square 2616, the core 102 performs the instant repairing 2504 in non-core repairing RAM2408.As described above, In one embodiment, before the core 102 performs the instant repairing 2504, the core 102 is by the instant repairing 2504 by the non-core Repairing RAM 116 is copied to non-core repairing RAM 2408.Flow is carried out to square 2618.
In square 2618, the core 102 writes the same of a synchronous situation 23 (being denoted as SYNC 23 in Figure 26 A~26B) Its synchronous buffer 108 is arrived in step request, and the core 102 is entered sleep state by the control unit 104, then when all cores During 102 one SYNC 23 of write-in, waken up by control unit 104.Flow proceeds to decision block 2621.
In decision block 2621, the core 102 determines whether the core 102 is that this that run into square 2602 applies microcode Repair the core 102 of instruction (compared with the core 102 that the repair information is received in square 2652).If so, then flow is carried out To square 2622;Otherwise, flow proceeds to square 2624.
In square 2622, the CAM data 2508 and core PRAM repairings 2512 are loaded into non-core PRAM by the core 102 116.Check in addition, the core 102 produces the one of loading CAM data 2508 and core PRAM repairings 2512 and and verify itself and the school Pair and 2514 match.More preferably say, the core 102 also conveys information to other cores 102, and it indicates the CAM data 2508 and core The length of PRAM repairings 2512, and the CAM data 2508 and core PRAM repairings 2512 are loaded in non-core PRAM 116 Position.More preferably say, the CAM data 2508 and core PRAM repairings 2512 are loaded into the one of non-core PRAM 116 by the core 102 Member-retaining portion, so that any previously non-core PRAM repairings 2518 are not destroyed (clobber), it is similar to institute in square 2612 The mode of description.Flow advances to square 2624.
In square 2624, the core 102 writes the same of a synchronous situation 24 (being denoted as SYNC 24 in Figure 26 A~26B) Its synchronous buffer 108 is arrived in step request, and the core 102 is entered sleep state by the control unit 104, then when all cores During 102 one SYNC 24 of write-in, waken up by control unit 104.Flow proceeds to square 2626.
In square 2626, the CAM data 2508 are loaded into it by non-core PRAM 116 and repair CAM by the core 102 2439.In addition, core PRAM repairings 2512 are loaded into its core PRAM 2499 by the core 102 by non-core PRAM 116.Favourable It is, because positive perform is rendered in identical microcode in microcode patching application known to all cores, even if the corresponding RAM repairings 2516 Non-core repairing RAM 2408 (it will occur in square 2632) is not yet written into, due to during the period (assuming that carrying out It is not repaired in the microcode of microcode patching application) will not have collision (hit) in repairing CAM 2439, therefore use and be somebody's turn to do It is safe that CAM data 2508, which are loaded into repairing CAM 2439,.Further, since positive known to all cores 102 perform that to be rendered in this micro- Code repairing application in identical microcode, and interrupt incite somebody to action not in any core 102 using until the repairing 2500 is transmitted to institute Untill having core 102, therefore by 2512 performed any renewals to core PRAM 2499 of core PRAM repairings, it include to Change the renewal (for example, function setting) for the value that may influence the core 102 operation, guarantee will not be seen in framework, until this Untill repairing 2500 has been transmitted to all cores 102.Flow proceeds to square 2628.
In square 2628, the core 102 writes the same of a synchronous situation 25 (being denoted as SYNC 25 in Figure 26 A~26B) Its synchronous buffer 108 is arrived in step request, and the core 102 is entered sleep state by the control unit 104, then when all cores During 102 one SYNC 25 of write-in, waken up by control unit 104.Flow proceeds to decision block 2631.
In decision block 2631, the core 102 determines whether the core 102 is that this that run into square 2602 applies microcode Repair the core 102 of instruction (compared with the core 102 that the repair information is received in square 2652).If so, then flow is carried out To square 2632;Otherwise, flow proceeds to square 2634.
In square 2632, the core 102 is loaded into RAM repairings 2516 to the non-core and repairs RAM 2408.In addition, the core 102 are loaded into non-core PRAM repairings 2518 to non-core PRAM 116.In one embodiment, non-core PRAM repairings 2518 include As the procedure code performed by the SPU 2423.In one embodiment, non-core PRAM repairings 2518 include the microcode institute use value Renewal, as described above.In one embodiment, non-core PRAM repairings 2518 include the procedure codes of SPU 2423 and the microcode The renewal of institute's use value.Advantageously, because positive known to all cores 102 perform that to be rendered in identical in microcode patching application micro- Code, more specifically, the repairing CAM 2439 of all cores 102 has been loaded into the new CAM data 2508 (for example, in square In 2626), and (be not repaired assuming that being rendered in the microcode that the microcode patching is applied) in repairing CAM during the period To not have collision (hit) in 2439.Further, since positive perform is rendered in phase in microcode patching application known to all cores 102 With microcode, and interrupt incite somebody to action not in any core 102 using untill the repairing 2500 is transmitted to all cores 102, by 2518 performed any renewal to non-core PRAM 116 of non-core PRAM repairings, including may influence the core to change The renewal (for example, function setting) of the value of 102 operations, guarantee will not be seen in framework, until the repairing 2500 has been transmitted Untill all cores 102.Flow proceeds to square 2634.
In square 2634, the core 102 writes the same of a synchronous situation 26 (being denoted as SYNC 26 in Figure 26 A~26B) Its synchronous buffer 108 is arrived in step request, and the core 102 is entered sleep state by the control unit 104, then when all cores During 102 one SYNC 26 of write-in, waken up by control unit 104.Flow ends at square 2634.
, should if procedure code is loaded on the non-core PRAM 116 for the SPU 2423 after square 2634 Repairing core 102 also then starts to perform the procedure code, as described in Figure 30.In addition, after square 2634, the repairing core 102 discharges The acquired hardware semaphore 118 in square 2634.Furthermore, after square 2634, the core 102 restarts Above-mentioned interruption.
Figure 27 is referred to, it is the sequential for the example for showing the microprocessor operation according to Figure 26 A~26B flow charts Figure.In this example, a microprocessor 100 configuration has three cores 102, is denoted as core 0, core 1 and core 2, as shown in the figure.So And, it should thus be appreciated that, in other embodiments, the microprocessor 100 may include the core 102 of varying number.In this timing diagram In, the sequential that event is carried out is as described in lower section.
Core 0 receives the request (each square 2602) of request repairing microcode and obtains the hardware semaphore with response 118 (each squares 2604).Core 0 then transmits a microcode patching information and interrupted to core 1 and core 2 (each square 2606).Core 0 It is then written to a SYNC 21 and enters sleep state (each square 2608).
Each core 1 and core 2 are finally by being interrupted and reading the information (each square 2652) in its current task.It is right This, each SYNC 21 of core 1 and core 2 write-in one simultaneously enters sleep state (each square 2608).As illustrated, for example, due to When the interruption is established, the factor of the instruction delay is just being performed, each core write-in SYNC 21 time may be different.
When all cores have been written into SYNC 21, all cores are waken up (each square 2608) by the control unit 104 simultaneously. The instant repairing 2504 is then loaded into non-core PRAM 116 (each square 2612) by core 0, and writes a SYNC 22, and Into sleep state (each square 2614).Each core 1 and core 2 write a SYNC 22, and enter sleep state (each square 2614)。
When all cores have been written into the SYNC 22, all cores are waken up (each square by the control unit 104 simultaneously 2614).Each core performs 2504 (each squares 2616) of instant repairing and writes a SYNC23, and it is (every to enter sleep state One square 2618).
When all cores have been written into the SYNC 23, all cores are waken up (each square by the control unit 104 simultaneously 2618).The CAM data 2508 and core PRAM repairings 2512 are then loaded into non-core PRAM 116 (each square 2622) by core 0, And a SYNC 24 is write, and enter sleep state (each square 2624).
When all cores have been written into the SYNC 24, all cores are waken up (each square by the control unit 104 simultaneously 2624).Each core is then loaded into it using the CAM data 2508 and repairs CAM 2439, and (every using core PRAM repairings 2512 One square 2626) its core PRAM 2499 is loaded into, and a SYNC 25 is write, and enter sleep state (each square 2628).
When all cores have been written into the SYNC 25, all cores are waken up (each square by the control unit 104 simultaneously 2628).RAM repairings 2516 are then loaded into non-core repairing RAM 2408 by core 0, and non-core PRAM repairings 2518 are carried Enter to non-core PRAM 116, and one SYNC 26 of write-in, and enter sleep state (each square 2634).
When all cores have been written into the SYNC 26, all cores are waken up (each square by the control unit 104 simultaneously 2634).As described above, if procedure code has been loaded on for the non-core PRAM 116 in the SPU 2423 with square 2632 During step, the core 102 also then starts to perform the procedure code, as described by figure 30 below.
Figure 28 is refer to, it is the block diagram for showing the multi-core microprocessor 100 according to another embodiment.The microprocessor Device 100 is in many aspects similar in appearance to Figure 24 microprocessor 100.Repaired however, Figure 28 microprocessor 100 does not include a non-core RAM, but RAM 2808 is repaired in each core 102 including a core, it provides similar with Figure 24 non-core repairings RAM 2408 Function.However, core repairing RAM 2808 in each core 102 by its each the institute of core 102 it is special and not with other institutes of core 102 It is shared.
Figure 29 A~29B are referred to, it is to show in Figure 28 according to another embodiment the microprocessor 100 to propagate One microcode patching to multiple cores 102 of the microprocessor 100 an operational flowchart.In Figure 28 and Figure 29 A~29B another reality Apply in example, Figure 25 repairing 2500 can be changed so that the check and correction and 2514 is using RAM repairings 2516, rather than uses and be somebody's turn to do Core PRAM repairings 2512, and integrality in the CAM data 2508, core PRAM repair 2512 and RAM repairings 2516 and are loaded into After the microprocessor 100 (for example, square 2922 in Figure 29 A~29B), the microprocessor 100 is enabled to verify the CAM numbers 2512 and RAM repairings 2516 are repaired according to 2508 integrality, core PRAM.Figure 29 A~29B flow chart class in many aspects Figure 26 A~26B flow chart is similar to, and the square equally numbered is also similar.However, square 2912 replaces square 2612, square 2916 replace square 2616, square 2922 replaces square 2622, square 2926 replaces square 2626 and the side of replacement of square 2932 Block 2632.In square 2912, the instant repairing 2504 is loaded into non-core PRAM 116 and (rather than is loaded into one by the core 102 Non-core repairs RAM).In square 2916, the core 102 perform this it is instant repairing 2504 before, by this it is instant repairing 2504 from Non-core PRAM 116 copies to core repairing RAM 2808.In square 2922, except the CAM data 2508 and core PRAM are repaiied Mend outside 2512, RAM repairings 2516 are loaded into non-core PRAM 116 by the core 102.In square 2926, the core 102 removes The CAM data 2508 are loaded into it by non-core PRAM 116 and repair CAM 2439 and by core PRAM repairing 2512 by this Non-core PRAM 116 is loaded into outside its core PRAM2499, and the core 102 also carries RAM repairings 2516 from non-core PRAM 116 Enter to it and repair RAM 2808.In square 2932, different from Figure 26 A~26B square 2632, the core 102 is not repaiied the RAM Mend 2516 and be loaded into non-core repairing RAM.
Can be by being observed in above-described embodiment, beneficial to propagating to 100 core of microprocessor, 102 each relational storage 2439/ 2499/2808 and the atom of the microcode patching 2500 to related non-nuclear memory 2408/116 propagate and carry out in such manner with true The integrality and validity of the repairing 2500 are protected, even if core 102 that is multiple while performing be present, its energy shared resource of core 102 is no Then when applied to traditional approach, core 102 may destroy each several part of (clobber) another core repairing.
Repair service processor procedure code
Figure 30 is referred to, it is to show Figure 24 microprocessor 100 to repair the flow of a service processor procedure code Figure.Flow starts from square 3002.
In square 3002, the procedure code performed as the SPU 2423 is loaded into one specified by a repairing by the core 102 Non-core PRAM 116 in patch address, as described in Figure 26 A~26B squares 2632 above.Flow enters the square 3004.
In square 3004, the core 102 controls the SPU 2423 to perform the procedure code in patch address, for example, the SPU 2423 procedure code is written in the address in non-core PRAM 116 in square 3002.In one embodiment, the SPU 2423 Configuration resets vector (for example, being extracted the SPU 2423 is removed after resetting to extract it since initial address buffer 2497 The address of instruction), and the patch address writes the initial address buffer 2497 by the core 102, being then written to one makes this In the control buffer that SPU 2423 is reset.Flow proceeds to square 3006.
In square 3006, the SPU 2423 starts in the patch address extraction procedure code (for example, extracting its first finger Make), for example, writing the address in the procedure codes of SPU 2423 to non-core PRAM 116 in square 3002.In general, it is resident Hotfix codes of SPU 2423 in non-core PRAM 116 will perform one and redirect (jump) to residing in non-core ROM The procedure codes of SPU 2423 in 2425.Flow ends at square 3006.
The function of repairing the procedure codes of SPU 2423 may be particularly useful.For example, the SPU 2423 can be used for substantially Of short duration performance test, for example, it may be not intended to make the procedure codes of performance test SPU 2423 turn into the microprocessor 100 A permanent part, and only turn into a part for development part, for example, for manufacturing part, only turn into development part A part.In another example, the SPU2423 can be used to look for and/or repair mistake.In another example, the SPU 2423 It can be used to configure the microprocessor 100.
The atom for being updated to the visual storage resources of the instant framework of each core is propagated
Figure 31 is refer to, it is the block diagram for showing the multi-core microprocessor 100 according to another embodiment.The microprocessor Device 100 is in many aspects similar in appearance to Figure 24 microprocessor 100.However, each core 102 of Figure 31 microprocessor 100 also includes Visible type of memory scope buffer (Memory Type Range Registers, MTRRs) 3102 on framework.Also It is to say, each core 102 instantiates visible MTRR 3102 on framework, even if System Software Requirement MTRR 3102 is in all cores It is consistent (more detailed description is as follows) in 102.MTRR 3102 is that each core instantiates visible storage resources on framework Visible storage resources embodiment is described as follows on example, and other each core instantiation frameworks.(although figure do not show that, But each core 102 also includes core PRAM 2499, core microcode ROM 2404, repairing CAM 2439 in Figure 24, and real one Apply in example, Figure 28 core microcode patching RAM 2808).
MTRR 3102 provides a kind of system software so that a type of memory with the system storage of microprocessor 100 Multiple different physical address ranges are related in address space.The example of different memory type includes strong not cacheable (strong uncacheable), not cacheable (uncacheable), write-in combine (write-combining), write-in is led to Cross (write through), write back (write back) and write protection (write protected).Every MTRR3102 is (bright Really or impliedly) specify a memory range and its type of memory.Each MTRR3102 common value defines a memory and reflected Penetrate, it specifies the type of memory of different memory ranges.In one embodiment, MTRR3102 be similar to Intel 64 with And IA-32 Framework Software developer's handbooks, the 3rd:System Programming guide, in September, 2013, particularly at Section 11.11 Description, it is cited and forms the part of this specification herein.
Wish that the memory as defined in MTRR 3102 is mapped in for being identical in 100 all cores of the microprocessor, So that the software operated in the microprocessor 100 has a memory consistency.However, in traditional processor, and Without hardware supported to maintain the internuclear MTRRs of polycaryon processor uniformity.The 3rd 11- of Intel handbooks as mentioned previously Description is explained in page 20 bottoms, " P6 and more nearest processor families provide have no provide to maintain [MTRRs values it is consistent Property] hardware supported ".Therefore, system software is then responsible for maintaining the uniformity across core MTRR.Quote Intel handbooks the in top 11.11.8 an algorithm of section description system software, it is closed to maintain and update nuclear phase each with its MTRRs polycaryon processor Uniformity, for example, all cores perform its respective MTRRs of renewal instruction.
On the contrary, the system software one of them middle MTRR 3102 that updates can each ask in the core 102 (instance), and in an atomic way being beneficial to the core 102 propagation, this is updated in all cores 102 of the microprocessor 100 The embodiment description of MTRR 3102 respective request (is similar to description up Figure 24 embodiment institutes into Figure 30 in this article The mode of the microcode patching performed).It provides a kind of maintaining 3102 framework instruction-levels of MTRR of different IPs 102 The method of uniformity.
Figure 32 is referred to, it is that the microprocessor 100 is updated to micro- place to propagate a MTRR 3102 in display Figure 31 Manage the operational flowchart of one of multiple cores 102 of device 100.Described by angle of the operation from a single core, but the microprocessor 100 each core 102 is updated to the descriptions of all cores 102 of the microprocessor 100 to carry out according to the MTRR3102 is propagated jointly Operation.In particular, Figure 32 descriptions run into the operation for the core for updating the MTRR 3102 instructions, and its flow starts from square 3202, and the operation of other cores 102, its flow start from square 3252.
In square 3202, core 102 one of them run into the instruction that the instruction core updates its MTRR 3102.Namely Say, the MTRR more new commands are written into the MTRR 3102 updated value including a MTRR3102 identifiers and one.Implement one In example, the MTRR more new commands are x86WRMSR instructions, and it is specifying in EAX:The updated value in EDX buffers and The MTRR3102 identifiers of the ECX buffers, it is the MSR addresses in the MSR address spaces of the core 102.For sound Should MTRR more new commands, the core 102 disable interrupt simultaneously prevent to perform the microcode of the MTRR more new commands.It should be appreciated that The system software including the MTRR more new commands may include a MIMD sequence, using the preparation updated as the MTRR 3102. More preferably, however say, its response as sequence single architecture instruction, the MTRR 3102 of all cores 102 instructs in the framework It is updated in level with an atomic way.It is deactivated that is, once interrupting in first core 102 (for example, in square 3202 In, the core 102 runs into the MTRR more new commands), propagate new MTRR 3102 when the microcode of execution and be worth to the institute of the microprocessor 100 When having core 102 (for example, untill after square 3218), interruption still maintains to disable.Furthermore the once quilt in other cores 102 Disable (for example, in square 3252), it is still deactivated until the MTRR 3102 of all cores 102 of the microprocessor 100 has updated Untill (for example, until square 2634 after untill).It is therefore advantageous that the new MTRR 3102 be worth in the framework instruction-level with One atomic way is transmitted in all cores 102 of the microprocessor 100.Flow proceeds to square 3204.
In square 3204, the core 102 obtains the ownership of the hardware semaphore 118 in Fig. 1.More preferably say, micro- place Reason device 100 includes a hardware semaphore 118 related to a MTRR 3102.More preferably say, the core 102 obtains firmly in such manner The ownership of part semaphore 118, its mode is similar to described by the Figure 20 of top, more specifically square 2004 and 2006.This is hard By use, due to being possible to core 102, one of them performs a MTRR 3102 and updated part semaphore 118, using as running into a MTRR The response of more new command, and one second core 102 runs into a MTRR more new commands, will start to update the MTRR as second core 3102 response, this is likely to result in incorrect execution.Flow proceeds to square 3206.
In square 3206, a core 102 transmits a MTRR fresh informations and to other cores 102 and transmits other cores of core 102 1 Between interrupt.More preferably say, interrupted in the time in a period of being deactivated (for example, the microcode does not allow itself to be interrupted), the core 102 prevent the microcode to respond the MTRR more new commands (in square 3202) or respond the interruption (in the square 3252), And be maintained in the microcode, untill square 3218.Flow proceeds to square 3208.
In square 3252, one of other cores 102 are (for example, except running into the MTRR more new commands in square 3202 A core outside the core 102) it is interrupted and receives MTRR renewals because of internuclear interrupt transmitted in square 3206 Information.In one embodiment, the core 102 in next framework instruction boundaries (for example, in next x86 instruction boundaries), which obtains, is somebody's turn to do Interrupt.In response to the interruption, the core 102 disables the microcode for interrupting and preventing to handle the MTRR fresh informations.Though as described above, So the flow in square 3252 is with described by the angle of a single core 102, but each other cores 102 are not (for example, in square Core 102 in 3202) it is interrupted in square 3252 and receives the information, and perform the step in square 3208 to square 3234 Suddenly.Flow proceeds to square 3208 by square 3252.
In square 3208, the core 102 writes the synchronization request (SYNC 31 is denoted as in Figure 32) of a synchronous situation 31 The core 102 is made to enter sleep state into its synchronous buffer 108, and by the control unit 104, and then when all cores 102 When having been written into SYNC 31, waken up by the control unit 104.Flow proceeds to decision block 3211.
In decision block 3211, the core 102 judges whether it is to meet the MTRR more new commands in square 3202 Core 102 (compared with the core 102 that the MTRR fresh informations are received in square 3252).If so, then flow proceeds to square 3212;Otherwise, flow proceeds to square 3214.
In square 3212, the core 102 will be updated MTRR identifiers and the MTRR quilts of instruction by the MTRR Renewal causes the MTRR updated value that all other core 102 can be seen that to be loaded into non-core PRAM 116.In an x86 embodiments In the case of, MTRR 3102 includes:(1) repair coverage MTRR, it includes one via single the 64 of the renewal of single WRMSR instruction Position MSR and (2) different range MTRR, it includes two 64 MSR, and every MSR is written into by a different WRMSR instructions, For example, the two WRMSR instructions specify different MSR addresses.(should for one of different range MTRRs, the MSR PHYSBASE buffers) include the memory range a plot and one to specify the type of memory a type field, And other MSR (the PHYSMASK buffers) include a significance bit and a masking column for setting the scope to cover (mask) Position.More preferably say, the MTRR updated value that the core 102 is loaded into non-core PRAM 116 is as follows.
If the 1st, the MSR is defined as the PHYSMASK buffers, the core 102 is loaded into non-core PRAM 116 1 128 Updated value, the updated value include as new 64 place value (it includes the significance bit and shading values) specified by the WRMSR instruction and The currency of the PHYSBASE buffers (it includes base value and types value).
If the 2nd, the MSR is defined as the PHYSBASE buffers:
If a, significance bit is just being set in the PHYSMASK buffers, the core 102 is loaded into non-core PRAM 116 The updated value of one 128, the updated value includes 64 place values that this is new as specified by the WRMSR instruction, and (64 place value includes the base Value and types value) and the PHYSMASK buffers currency (currency include the significance bit and shading values).
If b, significance bit is just being set in the PHYSMASK buffers, the core 102 is loaded into non-core PRAM 116 The updated value of one 64, the updated value only includes 64 place values that this is new as specified by the WRMSR instruction, and (64 place value includes the base Value and types value).
If in addition, the updated value of the write-in is the value of one 128, the core 102 sets a flag in non-core PRAM 116 Mark, if also, updated value when being the value of one 64, the core 102 removes the flag.Flow proceeds to square by square 3212 3214。
In square 3214, the core 102 writes the synchronization request of a synchronous situation 32 (SYNC 32 is denoted as in Figure 32) The core 102 is set to enter sleep state to its synchronous buffer 108, and by the control unit 104, then when all cores 102 write During one SYNC 32, waken up by control unit 104.Flow proceeds to square 3216.
In square 3216, the core 102 reads the MTRR 3102 write in square 3212 from non-core PRAM 116 Identifier and the MTRR updated value.Advantageously, the MTRR updated value propagate with an atomic way perform so that it is any may It is architecturally invisible to influence the renewal guarantee for the MTRR 3102 that respective core 102 operates, until the updated value has been transmitted to institute Untill the MTRR 3102 for having core 102, because positive perform is rendered in identical microcode in the MTRR more new commands known to all cores, And interrupt and will not used in any core 102, be until the updated value is transmitted to the respective MTRR 3102 of all cores 102 Only.As described in square 3212 in above the present embodiment, if the flag is set in square 3212, the core 102 also updates (in addition to fixed MSR) PHYSMASK the or PHYSBASE buffers;Otherwise, if the flag is removes (clear), Then the core 102 only updates fixed MSR.Flow proceeds to square 3218.
In square 3218, the core 102 writes the synchronization request of a synchronous situation 33 (SYNC 33 is denoted as in Figure 32) The core 102 is set to enter sleep state to its synchronous buffer 108, and by the control unit 104, then when all cores 102 write During one SYNC 33, waken up by control unit 104.Flow ends at square 3218.
After square 3218, the MTRR cores 102 discharge the hardware semaphore 118 obtained in square 3204.More Further, after square 3218, the core 102 restarting is interrupted.
Understood from Figure 31 and Figure 32, the system software operated in Figure 31 microprocessors 100 can be beneficial to perform at this A MTRR more new commands are performed in the single core 102 of microprocessor 100 to complete to update the finger of all cores 102 of the microprocessor 100 Determine MTRR 3102, and non-individual performs a MTRR more new commands in each core 102, it can provide the integrality of system.
One instantiation specific MTRR3102 in each core 102 is a system management range buffer (System Management Range Register, SMRR) 3102.Because the SMRR 3102 possesses procedure code and and SMM The operation of (System Management Mode, SMM) related data, such as a system management interrupt (System Management Interrupt, SMI) processor, therefore be referred to as the memory range specified by the SMRR 3102 SMRAM regions.When the procedure code run in a core 102 attempts to access the SMRAM regions, if the core 102 runs on SMM In, then the core 102 only allows this access;Otherwise, the core 102 ignores the write-in for writing the SMRAM regions, and recovers by this The fixed value of each is read in SMRAM regions.If in addition, the core 102 operated in the SMM is attempted at this Program code outside SMRAM regions, then the core 102 will establishment one hardware check exception.In addition, when the core 102 is operated in SMM When, the core 102 only allows procedure code to write in the SMRR3102.This is advantageous to SMM procedure codes and data in the SMRAM regions Protection.In one embodiment, the SMRR3102 is similar in Intel64 and IA-32 Framework Software developers handbook the 3rd Volume:System Programming guide, in September, 2013, particularly in 11.11.2.4 and 34.4.2.1 section descriptions, it is drawn herein With and form the part of this specification.
In general, each core 102 has its own SMM procedure codes and example of data in memory.Desirably The SMM procedure codes and data of each core 102 are protected to avoid coming not only from the procedure code run in itself, but also The procedure code run in another core 102.In order that being completed with SMRRs3102, system software is generally by multiple SMM programs Code and data instance are positioned over block adjacent in memory.That is, the SMRAM regions are one single including all SMM procedure codes With the adjacent memory region of data instance.If the SMRR 3102 of all cores 102 of the microprocessor 100, which has, specifies bag When including all SMM for all values in the single adjacent memory region of this of procedure code and data instance, this can be prevented in non-SMM In the procedure code that runs of a core update the SMM procedure codes and data instance of another core 102.When a time window is present in core 102 When the middle values of SMRR 3102 differ, for example, SMRRs 3102 has different values in the different IPs 102 of microprocessor 100, its Any value is clearly less than the entirety in the single adjacent memory region for including all SMM procedure codes and data instance, then system can A security attack can be vulnerable to, for giving SMM property, it is probably serious.Therefore, describe atom and propagate renewal Embodiment to SMRRs 3102 can be particularly advantageous.
In addition, visible storage resources on the other each core instantiation frameworks of the expectable microprocessor 100 of other embodiments Renewal be transmitted with an atomic way of the similar above method.For example, in one embodiment, each instantiation of core 102 should X86IA32_MISC_ENABLE MSR some bit field positions, and a performed WRMSR in a core 102 is with similar to as above A described mode is transmitted to all cores 102 in the microprocessor 100.In addition, embodiment is also contemplated by a WRMSR's To the other MSR being instantiated in all cores 102 of the microprocessor 100, it is all on framework and special for execution in one core 102 And/or current and future, all cores being transmitted in a manner of similar as described above one in the microprocessor 100 102。
In addition, although it is MTRRs that embodiment, which describes visible storage resources on each core instantiation framework, other implementations It is different from the resource of x86ISA instruction set architectures and other in addition to MTRRs that example, which is contemplated to each core instantiation resource, Resource.For example, other resources in addition to MTRRs include the MSR of CPUID values and report-back function, seem that vector is more Media extension (Vectored Multimedia eXtensions, VMX) function.
Although the present invention is disclosed above with preferred embodiment, so it is not limited to the present invention, people in the art Member do not departing from spirit and scope of the invention, when can do it is a little change and retouching, therefore protection scope of the present invention when with What the application claim was defined is defined.For example, software can enable, for example, function, manufacture, modelling, simulation, description and/ Or test device of the present invention and method.It is above-mentioned can by using general procedure language (such as:C, C++), hardware retouches Predicate speech (Hardware Description Languages, HDL) includes Verilog HDL, VHDL etc. to realize.It is such Software can be contained in tangible media with the kenel of procedure code, such as any other machine-readable (such as computer-readable) Storage medium for example semiconductor, disk, hard disk or CD (such as:CD-ROM, DVD-ROM etc.), wherein, when procedure code is by machine Device, such as computer is loaded into and when performing, this machine becomes to implement the device of the present invention.The method and apparatus of the present invention also may be used To be transmitted with procedure code kenel by some transmission media, such as electric wire or cable, optical fiber or any transmission kenel, its In, when procedure code is by machine, receives, is loaded into and performs such as computer, this machine becomes to implement the device of the present invention.When In general service processor implementation, procedure code combination processor provides an operation and is similar to the uniqueness for applying particular logic circuit Device.It is (embedded that device of the present invention and method may be included in a semiconductor intelligence property right core such as microprocessor core In HDL), and it is converted into the hardware product of integrated circuit.In addition, device of the present invention and method, which can include, has hardware And the composite entity embodiment of software.Therefore protection scope of the present invention is worked as and is defined depending on what the application claim was defined. Finally, those skilled in the art can based on disclosed herein concept and specific embodiment, do not departing from the present invention essence A little change can be done with retouching to reach the identical purpose of the present invention in god and scope.

Claims (21)

  1. A 1. microprocessor, it is characterised in that including:
    Multiple process cores, wherein visible storage resources on one respective framework of exampleization are examined in each processing of above-mentioned multiple process cores;
    Wherein, one first process cores of above-mentioned multiple process cores are configured as:
    Framework instruction is run into, wherein the instruction of above-mentioned framework uses the value instruction above-mentioned first by above-mentioned framework instruction Process cores update above-mentioned each visible storage resources on framework of above-mentioned first process cores;And
    As the response for running into above-mentioned framework instruction, fresh information is sent to each other processing of above-mentioned multiple process cores Core, above-mentioned value is supplied to each other process cores of above-mentioned multiple process cores, and updated using above-mentioned value at above-mentioned first Above-mentioned each visible storage resources on framework of core are managed,
    Wherein, each process cores of above-mentioned multiple process cores in addition to above-mentioned first process cores be configured as do not run into it is above-mentioned The above-mentioned value that framework is provided in the case of instructing using above-mentioned first process cores, update the above-mentioned respective framework of above-mentioned process cores Upper visible storage resources, using the response as above-mentioned fresh information.
  2. 2. microprocessor according to claim 1, it is characterised in that
    Above-mentioned first process cores be additionally configured to produce an interrupt requests to above-mentioned multiple process cores each other process cores, with As the response for running into above-mentioned framework instruction;And
    Each process cores of above-mentioned multiple process cores in addition to above-mentioned first process cores are configured with above-mentioned first The above-mentioned value that is there is provided of reason core updates above-mentioned each visible storage resources on framework of above-mentioned process cores, using as from above-mentioned the The response of the above-mentioned interrupt requests of one process cores.
  3. 3. microprocessor according to claim 2, it is characterised in that above-mentioned first process cores be configured as disable interrupt with As the response for running into above-mentioned framework instruction, until all above-mentioned multiple process cores have updated above-mentioned respective framework using above-mentioned value Untill after upper visible storage resources;And
    Each other process cores of above-mentioned multiple process cores are configured as disabling and interrupted using as from above-mentioned first process cores The response of above-mentioned interrupt requests, until all above-mentioned multiple process cores are using the above-mentioned each visible storage on framework of above-mentioned value renewal Untill depositing after resource.
  4. 4. microprocessor according to claim 2, it is characterised in that above-mentioned interrupt requests include nand architecture interruption please Ask.
  5. 5. microprocessor according to claim 1, it is characterised in that also include:
    The hardware semaphore shared by above-mentioned multiple process cores,
    Wherein, above-mentioned first process cores are additionally configured to obtain the ownership of above-mentioned hardware semaphore before above-mentioned value is provided.
  6. 6. microprocessor according to claim 1, it is characterised in that each process cores of above-mentioned multiple process cores include micro- Code, wherein the above-mentioned microcode of above-mentioned first process cores is configured to supply above-mentioned value, each process cores of above-mentioned multiple process cores Microcode be configured with the above-mentioned each visible storage resources on framework of above-mentioned value renewal.
  7. 7. microprocessor according to claim 1, it is characterised in that also include:
    One memory, shared by above-mentioned multiple process cores,
    Wherein, in order to provide above-mentioned value, above-mentioned first process cores are configured as writing above-mentioned value to shared above-mentioned storage Device.
  8. 8. microprocessor according to claim 1, it is characterised in that also include:
    One non-core control unit, shared by above-mentioned multiple process cores,
    Wherein, each process cores of above-mentioned multiple process cores are configured as updating the above-mentioned each of above-mentioned process cores using above-mentioned value From framework before visible storage resources, a synchronization request is produced to above-mentioned non-core control unit.
  9. 9. microprocessor according to claim 1, it is characterised in that also include:
    One non-core control unit, shared by above-mentioned multiple process cores,
    Wherein, each process cores of above-mentioned multiple process cores are configured as updating the above-mentioned each of above-mentioned process cores using above-mentioned value From framework after visible storage resources, a synchronization request is produced to above-mentioned non-core control unit;And
    Each process cores of above-mentioned multiple process cores are configured as wait and enable interruption, until all above-mentioned multiple process cores have been produced Untill raw above-mentioned synchronization request.
  10. 10. microprocessor according to claim 1, it is characterised in that above-mentioned each visible storage resources include on framework One x86 architecture memory Type Range buffers.
  11. 11. microprocessor according to claim 10, it is characterised in that above-mentioned type of memory scope buffer includes one X86 architecture system range of management buffers.
  12. 12. microprocessor according to claim 10, it is characterised in that
    Above-mentioned type of memory scope buffer includes variable range type of memory scope buffer on an x86 frameworks, wherein Variable range type of memory scope buffer includes a plot buffer and a masking buffer on above-mentioned x86 frameworks, wherein Above-mentioned masking buffer includes a significance bit;
    Wherein, instructed in response to running into above-mentioned framework, above-mentioned first process cores are additionally configured to:
    If above-mentioned masking buffer is specified in above-mentioned framework instruction, one of above-mentioned plot buffer in above-mentioned first process cores is provided Present value to each other process cores of above-mentioned multiple process cores and setting can be by each other of above-mentioned multiple process cores The flag that reason core is read;And
    If above-mentioned masking buffer is above-mentioned in the specified above-mentioned plot buffer of above-mentioned framework instruction and above-mentioned first process cores Significance bit is set at present, then provides the present value of above-mentioned masking buffer in above-mentioned first process cores to above-mentioned multiple process cores Each other process cores and above-mentioned flag is set;And
    Otherwise, above-mentioned flag is removed,
    Wherein, in order to update above-mentioned value, if above-mentioned flag is set, each process cores renewal of above-mentioned multiple process cores is above-mentioned Plot buffer and above-mentioned masking buffer, otherwise only update by above-mentioned framework instruction above-mentioned plot buffer or on State masking buffer.
  13. 13. microprocessor according to claim 1, it is characterised in that above-mentioned each visible storage resources include on framework The special module buffer of one x86 frameworks.
  14. 14. microprocessor according to claim 1, it is characterised in that each process cores of above-mentioned multiple process cores by with It is set to as both one of above-mentioned other process cores of above-mentioned first process cores and above-mentioned multiple process cores to operate.
  15. A kind of 15. performed method in the microprocessor, it is characterised in that above-mentioned microprocessor has multiple process cores, its In each processing of above-mentioned multiple process cores examine visible storage resources on one respective framework of exampleization, the above method includes:
    Framework instruction is run into by one first process cores of above-mentioned multiple process cores, wherein above-mentioned framework instructs use by above-mentioned frame One value of structure instruction indicates that above-mentioned first process cores update above-mentioned each visible storage on framework of above-mentioned first process cores Deposit resource;
    Fresh information is sent to each other process cores of above-mentioned multiple process cores from above-mentioned first process cores, using as running into The response of above-mentioned framework instruction;
    The each other process cores of above-mentioned value to above-mentioned multiple process cores are provided by above-mentioned first process cores, using above-mentioned as running into The response of framework instruction;
    Above-mentioned each visible storage resources on framework of above-mentioned first process cores are updated using above-mentioned value by above-mentioned first process cores, To be used as the response for running into above-mentioned framework instruction;And
    Each process cores by above-mentioned multiple process cores in addition to above-mentioned first process cores are not running into above-mentioned framework instruction In the case of the above-mentioned value that is provided using above-mentioned first process cores, update above-mentioned each visible storage on framework of above-mentioned process cores Resource, using the response as above-mentioned fresh information.
  16. 16. according to the method for claim 15, it is characterised in that also include:
    An interrupt requests are produced to each other process cores of above-mentioned multiple process cores by above-mentioned first process cores, using as running into The response of above-mentioned framework instruction;And
    The step of above-mentioned renewal, is performed as receiving the response of above-mentioned interrupt requests.
  17. 17. according to the method for claim 16, it is characterised in that also include:
    Disabled and interrupted using as the response for running into the instruction of above-mentioned framework by above-mentioned first process cores, until all above-mentioned multiple processing Untill core has used the renewal of above-mentioned value above-mentioned each on framework after visible storage resources;And
    Disabled and interrupted using as receiving from the upper of above-mentioned first process cores by each other process cores of above-mentioned multiple process cores The response of interrupt requests is stated, until all above-mentioned multiple process cores are using the above-mentioned each visible storage on framework of above-mentioned value renewal Untill after resource.
  18. 18. according to the method for claim 15, it is characterised in that above-mentioned microprocessor also includes by above-mentioned multiple process cores The hardware semaphore shared, the above method also include:
    Above-mentioned first process cores above-mentioned value is provided obtained into above-mentioned multiple process cores before each other process cores it is above-mentioned The ownership of hardware semaphore.
  19. 19. according to the method for claim 15, it is characterised in that by the first process cores the step of above-mentioned offer above-mentioned value Performed by microcode, and the step of above-mentioned renewal is as performed by the microcode of each process cores of above-mentioned multiple process cores.
  20. 20. according to the method for claim 15, it is characterised in that above-mentioned microprocessor also includes by above-mentioned multiple process cores The memory shared, wherein the step of above-mentioned offer above-mentioned value includes:
    Above-mentioned first process cores write above-mentioned value to shared above-mentioned memory.
  21. 21. according to the method for claim 15, it is characterised in that above-mentioned microprocessor also includes by above-mentioned multiple process cores The non-core control unit shared, the above method also include:
    Each process cores of above-mentioned multiple process cores are visible on the above-mentioned respective framework for update above-mentioned process cores using above-mentioned value Before storage resources, a synchronization request is produced to above-mentioned non-core control unit.
CN201710978680.8A 2013-08-28 2014-08-28 Microprocessor and execution method thereof Active CN107729055B (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US201361871206P 2013-08-28 2013-08-28
US61/871,206 2013-08-28
US201361916338P 2013-12-16 2013-12-16
US61/916,338 2013-12-16
US14/281,796 US9575541B2 (en) 2013-08-28 2014-05-19 Propagation of updates to per-core-instantiated architecturally-visible storage resource
US14/281,796 2014-05-19
CN201410431675.1A CN104238997B (en) 2013-08-28 2014-08-28 Microprocessor and its execution method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201410431675.1A Division CN104238997B (en) 2013-08-28 2014-08-28 Microprocessor and its execution method

Publications (2)

Publication Number Publication Date
CN107729055A true CN107729055A (en) 2018-02-23
CN107729055B CN107729055B (en) 2020-08-11

Family

ID=52227150

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201410431675.1A Active CN104238997B (en) 2013-08-28 2014-08-28 Microprocessor and its execution method
CN201710978680.8A Active CN107729055B (en) 2013-08-28 2014-08-28 Microprocessor and execution method thereof

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201410431675.1A Active CN104238997B (en) 2013-08-28 2014-08-28 Microprocessor and its execution method

Country Status (1)

Country Link
CN (2) CN104238997B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111817777A (en) * 2020-07-23 2020-10-23 广州优加智联技术有限公司 Idle optical fiber resource on-line monitoring system
CN115237475A (en) * 2022-06-23 2022-10-25 云南大学 Forth multi-core stack processor and instruction set

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114651237A (en) * 2019-10-24 2022-06-21 北京希姆计算科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
US20240160449A1 (en) * 2021-03-29 2024-05-16 SiFive, Inc. Configurable interconnect address remapper with event recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4484303A (en) * 1979-06-19 1984-11-20 Gould Inc. Programmable controller
US20050138249A1 (en) * 2003-12-18 2005-06-23 Galbraith Mark J. Inter-process communication mechanism
CN1632744A (en) * 2004-01-21 2005-06-29 智权第一公司 Mechanism in a microprocessor for executing native instructions directly from memory and method
US20090172369A1 (en) * 2007-12-27 2009-07-02 Stillwell Jr Paul M Saving and restoring architectural state for processor cores
US20090271601A1 (en) * 2008-04-25 2009-10-29 Zimmer Vincent J Method, device, and system for pre-memory symmetric multiprocessing flow
US8359462B1 (en) * 2007-11-21 2013-01-22 Marvell International Ltd. Method and apparatus for programmable coupling between CPU and co-processor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4484303A (en) * 1979-06-19 1984-11-20 Gould Inc. Programmable controller
US20050138249A1 (en) * 2003-12-18 2005-06-23 Galbraith Mark J. Inter-process communication mechanism
CN1632744A (en) * 2004-01-21 2005-06-29 智权第一公司 Mechanism in a microprocessor for executing native instructions directly from memory and method
US8359462B1 (en) * 2007-11-21 2013-01-22 Marvell International Ltd. Method and apparatus for programmable coupling between CPU and co-processor
US20090172369A1 (en) * 2007-12-27 2009-07-02 Stillwell Jr Paul M Saving and restoring architectural state for processor cores
US20090271601A1 (en) * 2008-04-25 2009-10-29 Zimmer Vincent J Method, device, and system for pre-memory symmetric multiprocessing flow

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111817777A (en) * 2020-07-23 2020-10-23 广州优加智联技术有限公司 Idle optical fiber resource on-line monitoring system
CN115237475A (en) * 2022-06-23 2022-10-25 云南大学 Forth multi-core stack processor and instruction set

Also Published As

Publication number Publication date
CN104238997A (en) 2014-12-24
CN107729055B (en) 2020-08-11
CN104238997B (en) 2019-02-15

Similar Documents

Publication Publication Date Title
CN104462004B (en) The method of microprocessor and its internuclear synchronous operation of processing
CN104216680B (en) Microprocessor and its execution method
TWI637316B (en) Dynamic reconfiguration of multi-core processor
CN104331388B (en) Microprocessor and the method for the internuclear synchronization of processing in microprocessor
CN104238997B (en) Microprocessor and its execution method
CN104216679B (en) Microprocessor and its execution method
CN104360727B (en) Microprocessor and the method for using its power saving
CN104239274B (en) Microprocessor and its configuration method
CN104239275B (en) Multi-core microprocessor and its relocation method
CN104331387B (en) Microprocessor and its configuration method
CN104239273B (en) Microprocessor and its execution method
CN104239272B (en) Microprocessor and its operating method
CN104216861B (en) Microprocessor and the in the microprocessor method of synchronization process core
EP3324288A1 (en) Multi-core microprocessor that dynamically designates one of its processing cores as the bootstrap processor
EP2843550B1 (en) Dynamic reconfiguration of mulit-core processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant