TW200401194A

TW200401194A - Method and apparatus for determining a processor state without interrupting processor operation

Info

Publication number: TW200401194A
Application number: TW92118292A
Authority: TW
Inventors: Timothy J Wood; Scott A White
Original assignee: Advanced Micro Devices Inc
Priority date: 2002-07-11
Filing date: 2003-07-04
Publication date: 2004-01-16
Also published as: CN1669004A; AU2003261128A1; AU2003261128A8; EP1576475A2; WO2004008319A3; WO2004008319A2; JP2006514349A

Abstract

A method and apparatus for determining an internal state of a host processor. Test data may be loaded into an output port of a service processor. The service processor may poll a valid bit stored in the host processor. Upon determining that the valid bit is clear, the service processor may transmit the test data to the host processor, and set the valid bit. State data may be generated responsive to the test data. The state data may be written into output port of the host processor. The service processor may receive the data from the output port of the host processor. The operation of determining the state of the host processor is performed without interrupting the execution of instructions in the host processor.

Description

200401194 玖、發明說明 [發明所屬之技術領域] 广"月係有關處理器’尤係有關一種決定—處理器的狀態之方法及裝置。 [先前技術] 在設計新的電腦系統及系統軟體時，可_ 來決定各種硬體機構在-系統處理器執行指令期間將如何表現體的表現之—概念涉及決定程式執行的各階段之處理益'狀態。決定處理器狀態可能涉及將資料傳送到處理器而查詢該處理器。哕_ °哀處理益接收的資料然後可能起作用，並產生用决和+.南用來才曰不處理器狀態的額外資料。此種資料可包括暫存器内容及指令保留區等的資料。然後可自該處理器傳送該額外資料，以便觀察。此外，亦可將一處理器之組设定成將資料定坤〇值.* u 只Tt疋期地傳达到一外部來源，且可在該外部來源中利用該資料夹氺定〆貝针木决疋一處理器狀態或其他的資訊。現在已開發出各種工具程式，用w4 八枉八用U查詢一處理器，以便決定該處理益的狀態。這杜且一 ”枉式可將測試貢料輸入到處理器，jt回應該處理哭接此史丨今、日1 ^ 士。。接收到6亥測試資料而產生狀態實料’且自該處理器傳送該狀能資粗疋茨狀悲貝枓以供觀察。此種工具程式的一個缺點涉及要在杳咱處、，文隹笪肩處理态的期間執行指令。此200401194 发明. Description of the invention [Technical field to which the invention belongs] The radio-related processor is particularly related to a method and device for determining the state of the processor. [Previous technology] When designing new computer systems and system software, _ can be used to determine how various hardware mechanisms will behave during the execution of instructions by the system processor. The concept involves determining the processing benefits of each stage of program execution. 'status. Determining processor status may involve querying the processor by transferring data to the processor.哕 _ ° The processing of the received data may then work and generate additional data that is used to indicate the state of the processor. Such information may include information such as the contents of the register and the reserved area of the instruction. This additional information can then be transmitted from the processor for observation. In addition, a processor group can be set to set the data to a value of 0. * u Only Tt is transmitted to an external source in a timely manner, and the data folder can be used to determine the data in the external source. Decide on processor status or other information. Various tool programs have now been developed, using w4 eight to eight to query a processor in order to determine the status of the processing benefits. This “Du Yiyi” method can input test materials to the processor, and jt responds to processing this history. Today, today 1 ^.... Received the test data from the 6th Haier to generate the status material 'and since then processing The device can send this information for observation. One disadvantage of this tool program involves executing instructions during the processing state of the server. This

種工具程式通常要求在處理器省詢坤M 、 ^ °π亘。j 4間中斷該處理器的作業。可能必須中斷該處理器正在執行的—指令流，以便：輪入測試資料’產生狀態資料，卄私與〜貝抖並幸則出狀態資料。測試資 92365 5 料的輸入及狀態資料的輸出可能入及輸出埴要使用現有的處理哭妗出。此外，科射該處理器的輸入及輸的服務常式，田而1 Z處理盗切換到—替代 α而可此造成目前正停。這些因夸可处.& Α、规^ 仕轨仃的扎令流之暫 U素了月匕&成難以確定某坻的速度1外，變換到—替代的服：：式在處…執行用處理器杳詢來麥定真者0 _ 5 Μ式可遮造成無法利 —岣木决疋異貫的處理器狀能 [發明内容] ‘。。本^明揭示一種決定一 Φ _ ?田要如疋主處理器的狀態之方法及裝置。在一實施例中，— ,.Ιλ_ k各處理斋可輪詢該主處理器中之 :器的暫存器中儲存之一有效位元。於判定該有效位 …月除時，該服務處理器可將測試資料載入一輸出暫存益，並將測試資料傳送到該主處理器，且於完成該傳送時㈣效位元。可回應該主處理器決定該有效位元是設 j :、而自°亥接收态的該暫存器接收該測試資料。可回應 :貝料而產生用來指示該主處理器的狀態之狀態資料4主處王里益可輪詢其輪出埠中之—4專送器的暫存器中一有效位兀且可回應偵測到該有效位元是清除的，而將貝料存放到该傳送器的該暫存器中。缺後可將該資料傳送到該服務處理器的該接收器，I可回應該傳送而設定該、，位元可在無須中斷指令的執行之情形下，執行將資 ^ ^ 63 Ίδ只人只 Ί ' 主處理器可以相互獨立之方式進行資科之傳該料傳送到該主處理器以及自該主處理器擷取資科德此夕卜 ϋ — — 接收。 6 92365 200401194 主處理器的該輸出埠傳送到該服務處理器的該輸入益關。此外，在各實施例中，可按照固定的期間將資料傳送進出處理器，或者可回應一個別的查詢而自處理器傳送資料。、 [實施方式] 現在請參閱第1圖，圖中示出一處理器（丨〇)的—實施例之方塊圖。其他的貫施例也是可行的，也可考慮使用。如第1圖中所示，處理器（丨〇)包含一預先提取/預先解碼單兀（12)、一分支預測單元（14)、一指令快取記憶體（16)、— 指令對準單元（18)、複數個解碼單元（20A至20C)、複數個指令保留區（22A至22C)、複數個功能單元（24A至24C)、一載入/儲存單元（26)、一資料快取記憶體（28)、—暫存器檔（3〇)、一重新排序緩衝區（32) ' — MROM單元（34)、及一匯流排介面單元（37)。將具有一特定代號加上—字母的各元件整體地標示為該代號。例如，將解碼單元（2〇A至 2 0C)整體地稱為解碼單元（2〇)。預先提取/預先解碼單元（12)耦合而自匯流排介面單元（37)接收指令，且進一步耦合到指令快取記憶體及分支預測單元（14)。同樣地，分支預測單元（14)係耦合到指令快取記憶體（16)。此外，分支預測單元（1仙合到各解碼單元（2〇)及各功能單元（24)。指令快取記憶體（】6)進一步耦合到MROM單元（34)及指令對準單元〇 8)。指令對準單元…）又搞合到各解碼單元（2〇)。每—解碼單元（2QA至Μ)係耗合到載入/儲存單元（26)，且耦合到各別的指令保留區 92365 8 200401194 在一實施例中，該主處理器包含一輸入埠及一輸出 ^j主處理器的該輸入埠之組態可組構成耦合到該服務處理益的—輸出淳，而該輪出埠之組態可設定成耦合到該服務處理器的—輸入淳。該主處理器及服務處理器的該等輛入埠及輪出埠可分別包含一組態設定成儲存若干位元之暫存益。每一暫存器之組態可設定成儲存一有效位元，而當該有效位元被設定時，可指示所儲存的資料是有效的。该寺輪入埠中之暫存器的組態可設定成當該除時料暫存器只可接收㈣。當„料自—輪出^；；到^入埠時，可在該輪入埠的暫存器中設定該有效位兀。該主處理器及服務處理器之組態可設定成：可回應偵測到該有效位元被設定而自其個別的輸入埠之暫存器擷取資料。在該服務處理器的一實施例中’可經由符合1£邱 11 49.1標準的邊界掃描測試存取埠(丁⑽入“㈣;簡稱 :AP)’而將資料載入輸出埠的暫存器，或自該輪入埠擷取貝料。可經由一測試資料輸入（丁⑶^ ;簡稱顶）接腳’而將資料德序載入兮。乂 °〆服知處理益輸出埠之該暫存哭。同樣地，可經由—.、目丨丨4 —穴 π〇 d 6式貝料幸别出（Test Data 〇ut;簡稱TD〇) 接腳，而將資料德床4夕山4 „。胃㈣序移4該服務處理器輸人料之該暫存益° 在各實施例中，輸入填于』v早及#則出埠的各組合之作業可以相互獨立之方式工作。丄 ” 換s之，忒服務處理器的該輸出埠 ^貝料傳运到該主處理哭 ^ 。口^ τ剧入埠係與將任何資料自該 92365 7 200401194 至2 2 C )。指令保留區（2 2 A至2 2 C)進一步耗合到各別的功能單元（24A至24C)。此外，各解碼單元及各指令保留區（22)耦合到暫存器檔（30)及重新排序緩衝區（32)。各功能單元（24)也耦合到載入以諸存單元（26)、暫存器檔（3 〇)、及重新排序缓衝區（32)。資料快取記憶體（28)耦合到載入/儲存單元（26)及匯流排介面單元（37)。匯流排介面單元（37) 進一步耦合到L2快取記憶體的L2快取記憶體介面、及一匯流排。最後，MROM單元（3句係耦合到各解碼單元（2〇) 指令快取記憶體（16)是用來儲存指令的高速快取記憶體。自指令快取記憶體（16)提取指令，並將指令派發到各解碼單元（20)。在一實施例中，指令快取記憶體〇6)之組= 設定成以一種具有64位元組快取線（―位元組包含8個二進位位元）的2路組關聯（2 way sei結構之方式儲存多達64千位元組的彳n在替代實施例中，亦可採用任何其他所需的組態及大小指令快取記憶體（16)實施為。例如，請注意，可將完全關聯式（fully 或直接對映式 associative)、組關聯式（set (direct mapped)組態。預先提取/預先解碼單元⑽將指令儲存到指令快取記憶體（16)。可根據—指令預先提取機制，而在指令快取記憶體（16)要求指令之前，先預先提取指令。㈣提取/預先解碼早兀（]2)可採用各種指令預先提取機制。者預先提 ^預先解碼單如2)將指令傳送到指令快取記㈣⑽ 預先提取/預先解碼單元（⑺可產生與該等指令對應的 92365 9 200401194 ，先解碼資料。例如，在—實施例巾，預先提确先解碼早兀(12)為該等指令的每一位元組產生三個預先解碼位凡：-起始位A、一終止位元、及—功能位元。該等預先解碼位元構Μ來指示每-指令的邊界之標記。該等預先解碼標記亦載有諸如解碼單元⑽是否可將—特定指令直接解碼、或是否藉由呼叫一《MR〇M單元(34)控制的微碼程序而執行該指令等的額外資訊。此夕卜，可將預先提取/ 預先解碼單元⑽之組態設定成：镇測分支指彳，並將對應於該等分支指令的分支預測資訊儲存到分支預測單元 (叫。其他的實施例可視需要而採用任何適當的預先解碼機制，或不採用任何預先解碼機制。时將說明對採用—可變位元组長度指令集的處理器 U〇j貫施例的預先解蜗標記之編碼。卩變位元組長度指令集是-種不同的指令可佔用不同數.目的位元組之指令集。一處理器（1 0)實施例戶斤i念九丨_ 1 ^ 所抓用的例不可變位元組長度指令集是χ86指令集。在該例示的編碼中，如果—特定的位元组是指令的第 -位元Μ，則設定該位元組的起始位％。如果該位元組是指令的最後-個位元組，則設定該位元組的終止位元。將解碼單元（2〇)可直接解碼的指令稱為“快速路徑，，指令。根據一實施例，將其餘的χ86指令稱為mr〇m指令。對於快速路徑指令…設定該指令中包含的每一前置位元組之功月b位7L ’亚β除其他位元組之功能位元。在替代實施例中’對於画Μ.指令而言，清除每一前置位元組的功能位 92365 10 200401194 組對元被功能算碼為與。例立gp 終止兀亚設定其他位元組的功能位元。檢查與終止位元 :的功能位元，即可決定指令的類型。如果該功能位 /月除，則該指令是—快速路徑指令。相反地，如果該位凡破設定，則該指令是一 MR〇M指令。一指令的運口而可位於解碼單元（2〇)可直接解碼的一指令内，作舌玄才旨* 1U破清除的功能位元相關聯之位元組如，包含兩個前置位元組、一 Mod R/M位元組、及一位元、及功能位元：起始位元 1 〇〇〇〇终止位元〇〇〇〇1 功能位元 11〇〇〇位疋組的快速路經指令將具有下列所示的起始位元、位元、及功能位元： MR0M #令是判定為太過複雜而無法由解碼單元（20) ^馬的指令。係呼1 MROM單元（34)，而執行MR〇M指 7。更具體而言，當碰到一 MROM指令時，MROM單元（3 ^析及拓令，亚將該指令發出到一子集的若干界定之快速路徑指令’以便完成所需之運算。MROM單元(34)將該: 集的快速路徑指令派發到解碼單元（2 0 )。 ^ 處理器(10)採用分支預測，以便猜測地提取在條件分支指令之後的若干指令。設有分支預測單元（14),以便^ 仃/刀支預測作t。在-實施例中，分支預測單元（} 4)採用一分支目標緩衝區，該分支目標緩衝區緩衝儲存指令快取圮憶體（]6)中一快取線的每一丨6位元組部分的多達兩個 200401194 -之刀支目標位址及對應的採取/不採取預測。該分支目標緩 / ί區可名如包含2 〇 4 8個資料項或任何適當數目的資料貝預先提取/預先解碼單元（1 2 )在將一特定的快取線預先解’I %決疋起始分支目標。可能因一快取線内的各指令之執竹·而發生對與該快取線對應的分支目標之後續更新。指 —h夬取D己隐體（】6)提供了被提取的指令位址之一指示，使刀支預測單元（1 4)可針對形成一分支預測而決定要選擇哪鲁二分支目標位址。解碼單元（20)及功能單元（24)將更新資訊提供給分支預測單元（14)。解碼單元（20)偵測並未被分支預、·!單元（1 4)預測到的分支指令。功能單元（24)執行該等分支曰7 並'夬疋該預測分支方向是否為錯誤的。可“採取，， X刀支方向，此時自該分支指令的目標位址提取後續的指 7。相反地，可以“不採取，，該分支方向，此時自與該分支指令接續的各記憶位置提取後續的指令。當偵測到一錯 A預測的分支指令時’處理器(1〇)的各單元即捨棄該錯誤 # =測的分支後之指令。在-替代組態中，可將分支預測單兀04)耦合到重新排序緩衝區（32)，而非耦合到解碼單元 (2(^及功能單元（24)，並且可自重新排序緩衝區（32)接收分支錯块預測資訊。分支預測單元（14)可採用各種適用的分支預測演算法。將自指令快取記憶體（16)提取的指令傳送到指令逢早几（18)。當自指令快取記憶體（16)提取指令時，掃描旁的預先解碼資肖，以便將與正在提取的指令有關之資言供給指令對準單元（18)(及MROM I元（34))。指令對2 200401194 兀(1 8)利用該掃描資料來將—指令對準每一解碼單元 (20)。在-實施例巾，指令對準單元〇8)將來自三組的八個指令位凡組之指+對準解碼|元（2〇)。解碼單元口㈤接收先—於由解喝單元（20Β)及（2〇c)同時接收的指令之一指令 (按照程式順序）。同樣地，解碼單元（2⑽）接收按照程式: 序先於由解碼單元（2 〇 C)同時接收的指令之—指令。在某些實施例（例如採用固定長度指令集的實施例）中可省略ς 指令對準單元（18)。解碼單元（20)之組態設定成將自指令對準單元接收的指令解碼。#'測暫存器運算元’並將該暫存器運算元資訊傳送到暫存器檔（3〇)及重新排序緩衝區（32)。此外’如果指令需要執行-個或多個記憶體作業，則解碼單兀（20)將該等記憶體作業派發到載入/儲存單元（26)。每— 指令被解碼成一組用於功能單元（24)的控制值，且這些.控制值連同該指令中可能包含的運算元位址資訊及位移或立即運算資料被派發到指令保留區（22)。在—特定實施例中，每一指令最多被解碼成兩個作業，且係由功能單元 (2 4 A至2 4 C)分別執行該等作業。 —處理。。（1 0)支援非循序執行，且因而採用重新排序緩衝區（3 2)來追縱暫存器讀取及寫入作業的原始程式順序，以便執行暫存器更名，而可進行推測式指令執行及分支錯玦預測的回復，並可有助於精確的異常狀況處理。於將涉及暫存益的更新之指令解碼時，保留重新排序緩衝區（32) 内的暫時性儲存位置，以便因而儲存推測的暫存器狀態。 92365 200401194This utility program usually requires the processor to consult Kun M, ^ ° π 亘. j 4 interrupts the processor's job. It may be necessary to interrupt the instruction flow that the processor is executing in order to: turn in the test data ’to generate status data, and fortunately, the status data. Test data 92365 5 Input and output of status data may be input and output. Use existing processing to cry out. In addition, the service routine of the processor's input and output was changed, and Tian Er 1 Z switched to-instead of α, which could cause the current stop. These factors can be dealt with. &Amp; Α, rules ^ The official order of the railroad has temporarily changed the moon dagger & into the difficult to determine the speed of a certain 11, change to-alternative service :: where is ... Executing the processor query to the McDonald's 0 _ 5 M type can be hidden and unprofitable-Tochigi must determine the dissimilar processor-like performance [inventive content] '. . This document discloses a method and a device for determining the state of a main processor. In one embodiment, each of the-,... 1λ_k processes may poll one of the significant bits stored in the register of the host processor. When the valid bit is determined to be divided by month, the service processor may load the test data into an output temporary benefit, and transmit the test data to the main processor, and invalidate the bit when the transfer is completed. In response, the main processor decides that the valid bit is set to j: and the register is received from the register in the receiving state. Responsive: The status data generated by the main processor is used to indicate the status of the main processor. 4 Wang Liyi, the main office, can poll one of the 4 registers in the round-trip port. It is detected that the effective bit is cleared, and the shell material is stored in the register of the conveyor. After the absence, the data can be transmitted to the receiver of the service processor. I can set the response in response to the transmission. The bit can be executed without interrupting the execution of the instruction. ^ ^ 63 Ίδ only Ί 'The main processor can transmit the information to the main processor in an independent manner. The material is transmitted to the main processor and the information is retrieved from the main processor at this time. 6 92365 200401194 The output port of the main processor is passed to the input of the service processor. In addition, in various embodiments, the data may be transferred into and out of the processor according to a fixed period, or the data may be transferred from the processor in response to another query. [Embodiment] Referring now to FIG. 1, there is shown a block diagram of an embodiment of a processor (丨 0). Other implementations are possible and can be considered for use. As shown in Figure 1, the processor (丨〇) includes a pre-fetch / pre-decode unit (12), a branch prediction unit (14), an instruction cache memory (16), and an instruction alignment unit. (18), a plurality of decoding units (20A to 20C), a plurality of instruction reserved areas (22A to 22C), a plurality of functional units (24A to 24C), a load / store unit (26), a data cache memory A body (28), a temporary register file (30), a reordering buffer (32) ', a MROM unit (34), and a bus interface unit (37). Each element having a specific code plus-letter is designated as the code as a whole. For example, the decoding units (20A to 20C) are collectively referred to as a decoding unit (20). The pre-fetch / pre-decode unit (12) is coupled to receive instructions from the bus interface unit (37), and is further coupled to the instruction cache memory and the branch prediction unit (14). Similarly, the branch prediction unit (14) is coupled to the instruction cache (16). In addition, the branch prediction unit (1 cent is connected to each decoding unit (20) and each functional unit (24). The instruction cache memory (6) is further coupled to the MROM unit (34) and the instruction alignment unit 08) . Instruction aligning unit ...) It is also connected to each decoding unit (20). Each-decoding unit (2QA to M) is consumed by the load / store unit (26) and is coupled to a respective instruction reserved area 92365 8 200401194 In one embodiment, the main processor includes an input port and a The configuration of the input port of the output processor can be configured to be coupled to the service processor—the output port, and the configuration of the round port can be set to be coupled to the service processor—the input port. The in-port and wheel-out ports of the main processor and service processor may each include a temporary benefit configured to store several bits. The configuration of each register can be set to store a valid bit, and when the valid bit is set, it can indicate that the stored data is valid. The configuration of the register in the port of the temple wheel can be set so that the register can only receive ㈣ when the time is removed. When „Material from—round out ^ ;； When entering the port, the effective bit can be set in the register of the round in port. The configuration of the main processor and service processor can be set to: can respond It is detected that the valid bit is set and data is retrieved from the register of its individual input port. In an embodiment of the service processor, it is' accessible via a boundary scan test conforming to the 1Q11 119.1 standard. Port (ding ⑽ enter "㈣; abbreviation: AP) 'and load the data into the register of the output port, or retrieve the shell material from the round port. The data order can be loaded through a test data input (Ding ^^; abbreviated top) pin.乂 ° 〆The server knows how to handle the temporary storage of the output port. In the same way, the data can be passed through —., Head 丨 4 — acupoint π〇d 6 type shellfish (Test Data 〇ut; TD〇 for short) pin, and the data bed 4 Xishan 4 „. Sequential shift 4 The service processor loses the temporary storage benefits. In each embodiment, the operations of each combination of inputting "v early and # the outbound port can work independently of each other. 丄" for s The output port ^ of the service processor is transferred to the main processing unit ^. Port ^ τ drama into the port system and any information from the 92365 7 200401194 to 2 2 C). The command reserve area (2 2 A to 2 2 C) is further consumed by the respective functional units (24A to 24C). In addition, each decoding unit and each instruction reserved area (22) are coupled to a register file (30) and a reordering buffer (32). Each functional unit (24) is also coupled to a loading unit (26), a register file (30), and a reordering buffer (32). The data cache memory (28) is coupled to the load / store unit (26) and the bus interface unit (37). The bus interface unit (37) is further coupled to the L2 cache memory interface of the L2 cache memory and a bus. Finally, the MROM unit (3 sentences is coupled to each decoding unit (20)) The instruction cache memory (16) is a high-speed cache memory used to store instructions. The instructions are fetched from the instruction cache memory (16), and Dispatch instructions to each decoding unit (20). In one embodiment, the group of instruction cache memory 0) = set to a type with a 64-bit cache line (-byte contains 8 binary bits 2 way sei structure to store up to 64 kilobytes of 彳 n. In alternative embodiments, any other required configuration and size instruction cache memory ( 16) Implemented as. For example, please note that fully associative or set (direct mapped) configurations can be configured. Pre-fetch / pre-decode unit: Store instructions to instruction Fetch memory (16). Can be fetched in accordance with the-instruction pre-fetch mechanism, and fetched in advance before the instruction cache (16) requires the instruction. ㈣ Fetch / pre-decode early () 2) Various instructions can be used Extraction mechanism. Those who pre-describe ^ pre-decode Such as 2) the instruction is transferred to the instruction cache. 提取 Pre-fetch / pre-decode unit (⑺ can generate 92365 9 200401194 corresponding to these instructions, decode the data first. For example, in the embodiment, pre-determined first decode early Wu (12) generates three pre-decoded bits for each byte of these instructions:-a start bit A, a stop bit, and-a function bit. These pre-decoded bits constitute M to indicate each -Marks of instruction boundaries. These pre-decode marks also contain information such as whether the decoding unit can decode a specific instruction directly, or whether it is executed by calling a microcode program controlled by the MROM unit (34) Additional information such as instructions. In addition, the configuration of the pre-fetch / pre-decode unit ⑽ can be set to: test branch fingers and store the branch prediction information corresponding to these branch instructions to the branch prediction unit (called Other embodiments may use any appropriate pre-decoding mechanism as needed, or do not use any pre-decoding mechanism. The pre-implementation of the embodiment of the processor U_j using the variable-byte-length instruction set will be described. The encoding of the snail mark. The variable byte length instruction set is-different types of instructions can occupy different numbers. The target byte instruction set. A processor (1 0) embodiment. ^ The captured example of the immutable byte-length instruction set is the χ86 instruction set. In the illustrated encoding, if—a particular byte is the -th bit M of the instruction, then set the start of the byte Start bit%. If the byte is the last byte of the instruction, set the end bit of the byte. The instruction that can be directly decoded by the decoding unit (20) is called "fast path, instruction According to an embodiment, the remaining χ86 instructions are referred to as mr0m instructions. For fast path instructions ... set the power of each pre-byte included in the instruction b bit 7L 'sub-β apart from other bytes Function bit. In an alternative embodiment, for the M. instruction, the function bit of each preamble byte is cleared. Example: Gp termination. Wuya sets the functional bits of other bytes. The function bit of the check and stop bit: determines the type of instruction. If the function bit / month is divided, the instruction is-fast path instruction. Conversely, if this bit is broken, the instruction is a MROM instruction. An instruction can be located in an instruction that can be directly decoded by the decoding unit (20) for the purpose of the tongue. * 1U breaks the cleared function bit. For example, it contains two leading bits. Group, a Mod R / M byte, and a bit, and a function bit: start bit 1 000 stop bit 1 000 bit fast 1 bit function bit 1 000 The path instruction will have the following start bit, bit, and function bit: MR0M #Let is an instruction that is determined to be too complex to be decoded by the decoding unit (20). Tie 1 MROM unit (34), and execute MROM 7. More specifically, when a MROM instruction is encountered, the MROM unit (3) analysis and extension order, the Asia issued the instruction to a subset of a number of defined fast path instructions to complete the required operation. MROM unit ( 34) Dispatch the fast path instruction of the set to the decoding unit (20). ^ The processor (10) uses branch prediction to predictively extract several instructions following the conditional branch instruction. A branch prediction unit is provided (14) In order to make ^ 仃 / knife support prediction t. In the-embodiment, the branch prediction unit (} 4) uses a branch target buffer, the branch target buffer buffer stores the instruction cache memory () 6) Up to two 200401194-to-blade target addresses and corresponding take / no take predictions for each 6-byte portion of the cache line. The branch target buffer / lower area can be named as it contains 208 8 The data item or any suitable number of data pre-fetch / pre-decode units (12) is pre-solving a specific cache line 'I% to determine the starting branch target. It may be caused by the instructions in a cache line. Hold the bamboo while the branch target corresponding to the cache line occurs Continue to update. Refers to -h fetching the D hidden body () 6) provides an indication of one of the fetched instruction addresses, so that the knife prediction unit (1 4) can decide which Lu Er to choose for forming a branch prediction. Branch target address. The decoding unit (20) and the functional unit (24) provide the update information to the branch prediction unit (14). The decoding unit (20) detects that it has not been predicted by the branch pre- ·! Unit (1 4) The branch instruction is executed by the functional unit (24) and the prediction branch direction is wrong. It can "take, X knife branch direction, and then extract the follow-up from the target address of the branch instruction. Refers to 7. Conversely, it is possible to "don't take, the branch direction, and then fetch subsequent instructions from each memory location following the branch instruction. When a wrong A-predicted branch instruction is detected, the processor ( 1〇) Each unit discards the instruction after the error # = measured branch. In the -alternative configuration, the branch prediction unit 04) can be coupled to the reorder buffer (32) instead of the decoding unit (2 (^ and functional unit (24), and The area (32) receives branch error block prediction information. The branch prediction unit (14) can use various applicable branch prediction algorithms. The instruction fetched from the instruction cache memory (16) is transmitted to the instruction morning (18). When the instruction is fetched from the instruction cache (16), the pre-decoded data next to the scan is scanned in order to supply the instruction alignment unit (18) (and MROM I element (34)) with the information related to the instruction being fetched. The instruction pair 2 200401194 (18) uses the scanned data to align the instruction to each decoding unit (20). In the example embodiment, the instruction alignment unit 08) will send eight instruction bits from three groups Where the group of fingers + alignment decode | element (20). Receiving port of the decoding unit-one of the commands (in the order of the program) received before the de-drinking unit (20B) and (20c) simultaneously. Similarly, the decoding unit (2⑽) receives one of the instructions in the order: before the instruction received simultaneously by the decoding unit (2OC). In some embodiments (such as those using a fixed-length instruction set), the Π instruction alignment unit (18) may be omitted. The decoding unit (20) is configured to decode an instruction received from the instruction alignment unit. # '测测器算元' and send the register operator information to the register file (30) and the reorder buffer (32). In addition, if the instruction needs to execute one or more memory operations, the decoding unit (20) dispatches the memory operations to the load / store unit (26). Each — instruction is decoded into a set of control values for the functional unit (24), and these .control values are dispatched to the instruction reserved area (22) along with the operand address information and displacement or immediate operation data that may be contained in the instruction . In a specific embodiment, each instruction is decoded into a maximum of two jobs, and these jobs are performed by functional units (2 4 A to 2 4 C), respectively. -deal with. . (1 0) supports non-sequential execution, and therefore uses the reordering buffer (3 2) to trace the original program order of the register read and write operations, in order to perform the register rename, and speculative instructions can be performed Execution and branch-wrong prediction responses, and can help with precise exception handling. When decoding an update instruction that involves temporary storage benefits, the temporary storage location in the reordering buffer (32) is retained in order to store the speculative register state accordingly. 92365 200401194

如果一分支預測是供 1 Μ , B丨丨-r JU 、’曰决的則可在該緩衝區中先使推測式執行的指令之結果連同錯誤預測的路徑失效，然後才將該等資訊寫人暫存器檔（3。）。同樣地，> 果m指令造成 -異常mi可給棄在該特定指令之後的各指令。在此種方式下，異常狀況县“锫成„ 凡疋精確的（亦即，在異常狀況之前，並未完成造成該異常狀況的特定指令之後的各指令）。請注意，如果係在按照程式順序排在—特定指令之前的各指^ 之前先執行了該特定指♦，則係以推測方式執行了該特: 指令。排在前面的指令可能是—分支指令或一造成異常狀況的4s令’在此種情报φ，舌紅j 重新排序緩衝區（3 2)可捨測式執行的結果。 ~ 將解碼單元(20)輸出端上提供的解碼後指令直接到個別的指令保留區（22)。在—實施例中，每—指令伴' 區（功可存放多達六個等候發送到對應的送指令之指令資訊(例如解碼後的指令以及運算元：元標記、及（或）立即運算資料）。請注意，對於第！圖所;：之貫施例而言，每一指令保留區(22)係與—專用的功能單兀（24)相關聯。因&，各指令保留區⑺）及功能單元（構成了三個專用的“發出部分”。換言之，指令 =帅4A)構成了發出部分。。各指令係對準到且派勒到心令保留區（22 A)，且由功能單元（24A)執行。同樣地，指令保留區（22B)及功能單元（24B)構成了發出 "、指令保留區（22C)及功能單元（24C)構成了發出部:2 ’以及在將-特定指令解瑪時’如果一必須的運算：是一暫 200401194 存器位置，則將暫存器位址資訊同時傳送到重新 ^ (32) A t # H- (30) (30) ^ 4 ^ 〇〇)^ ^ %的扎令集中包含的每—架構暫存器之儲存位只播（叫内可包含額外的儲存值置，以供麻 > 存器用。舌虹灿—- 早7^ (34)使，排序緩衝區（32)包含會改變這些暫存器的暫時性儲存位置，目而可進行非循序執行。將重新排：：衝區(3 2)的—暫時性儲存位置保留給解碼時判定： :-個真實暫存器的内容之每—指令。因&，在執行 =程式期間的各時點上，重新排序緩衝區（32)可以有 ^ :個位置’’亥等位置包含了 -特定暫存器的推测式執行内=。如果在將—特定指令解碼之後決定重新排序緩衝區 ^具有被指定給用來作為該特定指令的—運算元的—暫存為之—個或多個先前位置，則重新排序緩衝區（32)將下列貧料傳送到對應的指令保留區：（1)最近被指定的位置中之2，或（2)該最近被指定的位置之一標記，而傳送該標記之刖提為最終將執行先前指令的功能單元尚未產生⑴中輕值°如果重新排序緩衝區（32)具有保留給一特定 7存器的一位置，則並非自暫存器檔（3〇)而是自重新排序 Λ充區（3 2)彳疋供運异元值（或重新排序緩衝區標記）。如果並 j將任何位置保留給重新排序缓衝區（32)中之一所需暫存 :，則直接自暫存器檔（3〇)取得該值。如果該運算元對應曰。己丨思位置，則將該運算元值經由載入/儲存單元（26)而提供給該指令.保留區。在特定實施例中，重新排序緩衝區（3 2)之組態設定If a branch prediction is for 1 M, B 丨丨 -r JU, 'Yes, you can invalidate the results of speculatively executed instructions along with the path of misprediction in this buffer before writing this information Person temporary register file (3.). Similarly, if the m instruction causes-an exception mi may be given to each instruction discarded after that particular instruction. In this way, the anomaly county “锫成” is accurate (that is, the instructions following the specific instruction that caused the anomaly were not completed before the anomaly). Please note that if the specific instruction ♦ is executed before the instructions ^ that precede the specific instruction in program order, the special: instruction is executed speculatively. The leading instruction may be a branch instruction or a 4s order that caused an abnormal condition. In this kind of information φ, the tongue red j reorders the buffer (32) as a result of the testable execution. ~ The decoded instruction provided on the output of the decoding unit (20) is directly sent to the individual instruction reserved area (22). In the embodiment, each instruction companion area can store up to six instruction information (such as decoded instructions and operands: meta tag, and / or immediate operation data) waiting to be sent to the corresponding sending instruction. ). Please note that for the first embodiment, each instruction reserved area (22) is associated with a dedicated function unit (24). Because of &, each instruction reserved area ⑺ ) And the functional unit (which constitutes three dedicated "issues." In other words, instruction = handsome 4A) constitutes the issue. . Each instruction is aligned to and sent to the mind reserve area (22 A) and is executed by the functional unit (24A). Similarly, the instruction reserved area (22B) and the functional unit (24B) constitute the issuing ", the instruction reserved area (22C) and the functional unit (24C) constitute the issuing unit: 2 'and when the specific instruction is decomposed' If a required operation: a temporary 200401194 register location, the register address information will be transmitted to the re ^^ (32) A t # H- (30) (30) ^ 4 ^ 〇〇) ^ ^% The storage bits of each-frame register included in the Zaling set are only broadcast (the caller may contain additional storage values for the hemp > register. Tong Hongcan --- as early as 7 ^ (34) make, sort The buffer area (32) contains temporary storage locations that will change these registers, and can be executed non-sequentially. Reordering :: Block (3 2) —The temporary storage locations are reserved for decoding when judged :: Each of the contents of a real register-instruction. Because & at each point in time during the execution of the program, the reorder buffer (32) may have ^: locations such as 亥The speculative execution of the register =. If you decide to reorder the buffer after decoding a particular instruction Assigned to one or more previous locations of —operator’s—temporarily stored as that particular instruction, the reorder buffer (32) transfers the following lean material to the corresponding instruction reserved area: (1) 2 of the most recently assigned position, or (2) one of the most recently assigned positions, and the transfer of the marker to the functional unit that will eventually execute the previous instruction has not yet yielded a medium light value if the buffer is reordered The area (32) has a position reserved for a specific 7 register, not from the temporary register file (30), but from the reordering (3 2) 彳疋 for the transport of different values (or reordering) Buffer mark). If you want to reserve any position to one of the reorder buffers (32): temporarily, then get the value directly from the register file (30). If the operand corresponds to If the position is considered, the operand value is provided to the instruction via the load / store unit (26). The reserved area. In a specific embodiment, the configuration setting of the reorder buffer (32) is reordered.

92365 200401194 成以—單彳纟方式儲存並操作若干同時被解碼的指^。太文中將把該組綠稱Λ “.導θ ,v“r ’的i 本起操作數個於入（riented”）。可藉由一體。例如曰”而間化重新排序緩衝區（32)内採用硬配有在p 。。貫施例中包含的-線導向重新排序緩衝區分午碼單元（2〇}派發一個或多個指令時二個指令(各指令係來自每一解碼單元(2〇 :存/、二奸姑奋相較之下，在傳統的重新排序緩衝區中，係 "貝際派發的指令數目而分配可變數量的儲存位置^ 能1：亜k A I〜褚存位置。可又^目的邏輯閉來分配可變數量的儲存位置。當 ;儲存到暫存器播(3。)。然後空出儲存位置，以輯被解碼的指♦。此外，每—指令所用的控制邏 :里減少了，這是因為係將控制邏輯分攤到數個同時衝二=。可將用來識別一特定指令的-重新排序緩 =刀成兩個攔H標記及—偏移量標記。該線丁 s为其中包括該特定指令的一組同時被解碼的指令， :該：移：標記識別該組内的哪一指令對應於該特：指 7 Μ /主意，係將指令結果儲存到暫存器檔(30)並空出對㈣儲存位置稱為使指令“退休，,（“⑽ring,,)。進—步^ :在處理器（10)的各種實施例中可採用任何的重緩衝區組態。如前文所述’指令保留區（22)儲存指令，直到由對應的功能單元（24)執行該等指令為止。如果符合下列停相選擇-指令以供執行，已提供了該指令的運算元；以及 92365 16 200401194 (11)尚未提供在相同指令保留區（22A至22C)内且其程式顺序先於該指令的各指令之運算元。請注意，當由其中—個功能單元（24)執行一指令時，在傳送該指令的結果以便更新重新排序緩衝區2)的同時，將該指令的結果直接傳送到正在等候該結果的任何指令保留區（22)(通常將該技術稱為“結果轉送”（“result forwarding”）。於轉送相關聯的結果之時脈週期巾，可選擇一指令以供執行，並將該指切送到功此單元（24A至24C)。在此種情形中，指令保留區（22)將該轉送的結果傳送到該功能單元（2句。在可將若干指令解碼為將由各功能單元（24)執行的多個作業之實施例中，可以相互獨立之方式安排該等作業之執行時程。 ^在一實施例中，係將每一功能單元（24)之組態設定成執行加法及減法的整數算術運算、以及移位、旋轉、邏輯運异、及分支作業。係回應由解碼單元（2〇)為一特定指八 =碼的控制值而執行該等作業。請注意，亦可採用皁元（圖中未示出）來提供浮點運算。該浮點單元可以一辅助處理器之方式作業，自MR0M單元（34)或重新排序緩衝區（32)接收指令，然後與重新排序緩衝區（32)溝通，以便$ ㈣等指令。此外’可將功能單元（24)之組態設定成為^ 入/儲存單元（26)執行的載入及儲存記憶體作業而執行位址產生。在—特定實施例巾，每—功能單元（Μ)可包^ 生位址之—位址產生單元、以及用來執行其餘的^ 之—執行單元。在-時脈週期中，這兩個單元可獨: 不同的指令或作業進行作業。 92365 17 200401194 ^ "扣干亦將與條件分支指令的執行有關之貝Λ提供給分支預測單元（1 4)。如果一分支預測是錯誤 ^貝】刀支預測單元（1 4)清除在該錯誤預測的分支之後且 ^進入指令處理管線的各指令，並自指令快取記憶體（丨6) 己^肢提取所需的指令。請注意，在此種情形中，捨 ^ °程式序列中在該錯誤預測的分支指令之後發生的 7、°果’其中包括被以推測方式執行且暫時地儲存在載入/儲存單元（26)及重新排序緩衝區（32)的那些指令之結 2進步請注意，功能單元（24)可將分支執行結果提供給重新排序緩衝區（32)，而重新排序緩衝區（32)可向功能單元（24)指示分支的錯誤預測。如果正在更新—暫存器值，則將功能單元（24)所產生的結果傳送到重新排序緩衝區⑽，且如果改變了 —記憶的内谷，則將該結果傳送到載入/儲存單元（26)。如果將要儲存在—暫存器中，則當指令被解瑪時’重新 ^衝區（32)將結果儲存在保留給暫存器值的位置。設入編… ^排(38)用以傳达來自功能單元(24)及載早；7L (2 6 )的結果。結果匯流、^ ^ ^ 果、，v « 4 L极排（38)傳达所產生的結及用來識別被執行的指令之揭 # X , OD 更新排序綾衝區標記。諸存早兀（26)提供功能單元體入千凡卩4)兵貝枓快取記憶奴(28)間之一介面。在— 組態戰入/试存早元（26)之 .’〜、。又疋成具有•第一載入/儲在绘/紅广未存取資編戟蝴存緩衝區’該缓衝區具有並料及位址資切以…待處理载入或健存作業的資貝几之料位置，以及第二載人⑽存緩衝區，該92365 200401194 to store and operate several fingers that are decoded simultaneously in a single way. In Taiwen, this group of greens will be called Λ ". 导 θ, v" r 'i, and the operands are numbered (riented). It can be reorganized by one body. For example, "" to reorder the buffers (32) The interior is hard-fitted at p. . The -line-oriented reordering buffer meridian unit (20) included in the embodiment is two instructions when one or more instructions are dispatched (each instruction is from each decoding unit (2: In contrast, in the traditional reordering buffer, a variable number of storage locations are allocated based on the number of instructions issued by Beji ^ Can 1: 亜 k AI ~ Chu storage location. The purpose logic is closed again Allocate a variable number of storage locations. When; store to the scratchpad (3.). Then free up the storage location to edit the decoded fingers. In addition, the control logic used for each instruction is reduced, which It is because the control logic is allocated to several simultaneous red =. The -reordering slow = used to identify a specific instruction can be divided into two block H marks and -offset marks. The line s is included in it A set of instructions that are decoded at the same time for this particular instruction:: The: Move: Marker identifies which instruction in the group corresponds to the special: refers to 7 M / idea, which stores the result of the instruction to the register file (30) And vacating the confrontation storage location is called making the instruction "retire," ("⑽ring ,,). Step ^: Any re-buffer configuration can be used in various embodiments of the processor (10). As described above, the 'instruction reservation area (22) stores instructions until the corresponding functional unit (24) executes the Wait until the instruction. If the following phase failure selection-instruction is executed for execution, the operand of the instruction has been provided; and 92365 16 200401194 (11) has not been provided in the same instruction reserved area (22A to 22C) and its program order first The operand of each instruction of the instruction. Please note that when an instruction is executed by one of the functional units (24), the result of the instruction is transmitted to update the reorder buffer 2), and the result of the instruction Directly transferred to any instruction reserved area (22) that is waiting for the result (this technique is commonly referred to as "result forwarding". At the clock cycle of the associated result, an instruction can be selected to For execution, and cut the finger to this unit (24A to 24C). In this case, the instruction reservation area (22) transfers the result of the transfer to the functional unit (2 sentences. After several fingers can be In the embodiment decoded as a plurality of operations to be performed by each functional unit (24), the execution schedule of these operations can be arranged independently of each other. ^ In one embodiment, each functional unit (24) The configuration is set to perform addition and subtraction of integer arithmetic operations, as well as shift, rotation, logical operation, and branch operations. These are performed in response to the control unit (20) for a specific finger eight code control value. Please note that you can also use a soap element (not shown in the figure) to provide floating point operations. The floating point unit can be operated as an auxiliary processor, from the MR0M unit (34) or the reorder buffer (32) Receive the instruction and then communicate with the reorder buffer (32) for instructions such as $ ㈣. In addition, the configuration of the function unit (24) can be set to load and store the memory operation performed by the ^ in / storage unit (26) and perform address generation. In the specific embodiment, each functional unit (M) may include an address generating unit, and an execution unit for executing the remaining units. In the -clock cycle, these two units can be independent: different instructions or jobs to work on. 92365 17 200401194 ^ " Withdrawal also provided the branch prediction unit related to the execution of the conditional branch instruction to the branch prediction unit (1 4). If a branch prediction is wrong ^ The knife prediction unit (1 4) clears each instruction after the incorrectly predicted branch and enters the instruction processing pipeline, and fetches it from the instruction cache memory (6) Required instructions. Please note that in this case, the results of the rounding sequence that occurred after the mispredicted branch instruction in the program sequence include a speculative execution and temporary storage in the load / store unit (26) And the reordering of those instructions that reorder the buffer (32). 2 Please note that the function unit (24) can provide the branch execution results to the reorder buffer (32), and the reorder buffer (32) can provide the function unit (24) Incorrect prediction of the indicated branch. If the-scratchpad value is being updated, the result produced by the functional unit (24) is transferred to the reordering buffer⑽, and if the inner valley of the memory is changed, the result is transferred to the load / store unit ( 26). If it is to be stored in a register, when the instruction is decoded, the area is re- flushed (32) and the result is stored in a place reserved for the register value. Set the editor ... ^ (38) is used to communicate the results from the functional unit (24) and load; 7L (2 6). The result confluence, ^ ^ ^ results, and v «4 L pole row (38) conveys the resulting knot and the instruction used to identify the executed instruction # X, OD updates the sorting punch zone flag. Zhucunzaowu (26) provides a functional unit, body entry, Qianfan 卩 4) Bingbei's cache memory, an interface between slaves (28). In — Configuration war enter / test storage of early yuan (26) of. ’~~. It also has the first loading / storing in the drawing / red-growth non-accessing resources editor's storage buffer 'The buffer has the data and address information to be used for ... pending loading or storage operations. Beiji's expected location, and the second manned storage buffer, the

J8 92365 緩種ί區且古P + /、有已存取資料的資料及位址資訊之儲存ΓΓ )的載入及館存作業含12個杨¥ 置例如，該第一堪4 _ 12個位置’且該第。“ ^衝區可包兀（20)仲裁對栽。含32個位置。解碼已滿時，解碼翠元=:?(26)的存取。當該第—緩衝區 “的載人或儲存有可供待 (26)也執行載入記情姊你酱門為止。載入/儲存單元相依性檢查，以便確:維持了=理:::存記憶體作業之處理器0。)與主記情體子J二致性。記憶體作業是在資料快取’ @脚心糸，洗間之資料的轉移（但是亦可 :::取爾㈣中完成該資料轉移)。 =:記憶體中儲存的-運算元的指令之結果，也；：处、、貢料轉移但並未造成其他作業的載入/儲存指令之、，、〇〇 t資料快取記憶體（2 8)是一種用來暫時地儲存在載入/ 儲存單元(26)與主記憶體子系統之間轉移的資料之高速快取。己隐體。在一實施例中，資料快取記憶體（28)具有在雙路組關聯結構中最多可儲存64千位元組的資料之一容量我們當了解’可以其中包括一組關聯組態、一完全關聯組態、一直接對映組態的各種特定記憶體組態、以及任何其他組態的任何適當容量來實施資料快取記憶體（28)。匯流排介面單元（3 7)之組態設定成經由一匯流排而在電腦系統的處理器（10)與其他組件之間溝通訊息。例如， s玄匯流排可與Digital Equipment Corporation所開發的 Ev-6匯流排相容。在替代實施例中，可使用任何適當的連 39 92365 200401194 線結構’其中包括封包式單向或雙向鏈踗崎寺。亦可用一選擇性使用的L2快取記憶體介面，用以連接文〜弟一階快取記憶體。請注意’雖然第1圖所示之實施例是-超純量 (superscalar)實施例，但是其他的實施例休用純夏實施例。此外，功能單元的數目可隨著各實施例而有所燃化其他的實施例可使用^集中式指令保留區，而不是^第圖所示之若干個別的指令保留區。此外，其他的實施例可採用-中央排程器，而不是使用第i圖所示之指區及重新排序緩衝區。 ’' 處理器（10)可包含圖中示為郵件信箱埠（1〇㈠之一埠，而在執行—指令流期間可查詢該，件俨餡Mnn、卞乜相埠U〇〇)。可將郵牛L相和00)之組態設定成耦合到在處理服務處理器戋一降样产理奖 _ L r。丨的一次除#曰處理為。可經由郵件信箱璋收用來查詢處理器（1〇)的狀 )接封h ’ 7狀恶之測έ式貧料。同樣地，可將 :一柯所得到的用來指示處理器。〇)的狀態之狀 :件信箱璋(1〇。舰到該服務處理器或除錯處理I: 7:箱40曜合到處理器⑽的-個或多個單：：〇耦合到解碼單元（2〇Α至2〇記憶體α 6)、及&存^田（3〇)、指令快取 … 及力此早兀（24入至24C)等單元。可运到郵件信箱埠（1⑽），< 貝況傳 J忒自郢件k相埠（I 00)接收資訊。現在請參閱第2圓，圖中示出 k , T 丁 ^用米決疋—處理哭的站恶的一系統實施例之方祯岡+ °。白勺狀理益（]0)(後文中验γ > A J將處 )(茨文中將稱為主處理器（1合到服務處理器 92365 20 194 (14〇)。在某些實施例中么Μ Λ 服務處理器（140)可位於另一雷糸統中，而該另一電腦系一力％月包 —希脗么頌^ ,, 耦s到貫施主處理器（1〇)的、㈠ . 例中，服務處理器（140)可仞於被***主處理器（10)所在）了位方； ν 的毛腦糸統的周邊裝置插槽之一毛路板上，或者甚至可蔣丨日〜 π〇 η Λ處理器（140)安裝到盥主處理盗（10)相同的電路板上。，、王處理如前文所述，主處理哭 ^ , (1〇)包含一郵件信箱埠(Ίί)〇、郵件信箱埠（1 〇〇 )可包含t )。认，郵件信箱輸入埠（102)及郵桦产Μ 輻出埠（1 04)。可將郵件俨r Η牛L相 $件h相輸入琿〇〇2)之组態設到服務處理器（140)的互補輪出蟑（152)。同樣地’ t 信箱輸出崞（1()4)之組態 =可將郵件互補輪入埠（154)。成輕。到服務處理器U40)的主處理器（1〇)的郵件信哭iMOW皇2 μ , 旱（102)接收自服務處理 “)傳运的測試資料。可將該測試資料自服… wo)的輸出埠〇52)傳^丨η ^自服務處理器埠002)。同㈣處理器⑽的郵件信箱輸入處理哭。0)接將服務處理器(14〇)之組態設定成自主 :二=，。可將該狀態資料自主處理器㈤) I件L相輸出埠（】〇 (154)。每——各處理叫4())的輪入埠式知出埠對可以相互獨立之方箱輸入埠（咖貧料傳送到郵件信，亥傳送係與將狀態資料自郵件俨γ i 埠（1 04)僖详Μ认ώ 月ίΤ日兩件仏相輪出。，V到知入埠U 54)之任何行為無關。載入測試#料輸人⑽)接腳將測試資料串列地 4處理器（】4〇)的一輸出璋⑽），而該™接聊是 92365 21 200401194 測試存取埠（TAP)的一部 IEEE i* ,, ，廿取垾可以是符合 tE私準1 1 49.1的一邊界掃描蜂。續査粗& , 干 Ν铋地，可經由一測 ”貝抖幸别出（TDO)接腳而將狀態資料串 #中。πτ七i 了叶平列地自輸入埠（154) 下文中將說明自服務處理器（ 1 4〇)载入測試資料並將狀心、―貝料解載之額外細節。現在晴芩閱第3圖’該方塊圖哭μ山6 丁出破耦合到一服務處 -幸則出埠的主處理器輸埠 —J8 92365 Slow planting area and ancient P + /, storage of stored data and address information ΓΓ) The loading and storage operations include 12 Yang ¥ For example, the first worth 4 _ 12 Location 'and the first. "^ Chong area can include (20) arbitration pairing. Including 32 locations. When decoding is full, decoding Cui Yuan =:? (26) access. When the first-buffer" manned or stored Waiting (26) is also performed to load the memory sister so far. Load / store unit dependency check to make sure: Maintained = Logic ::: Processor 0 for memory operation. ) The same as the main body feeling J. The memory operation is in the data cache ’@ 脚心糸, the transfer of the data in the bathroom (but the data transfer can also be completed in ::: fetching). =: The result of the instruction of the -operator stored in the memory, also ;: The load / store instruction of the processing, transfer, but not causing other operations, the 〇〇t data cache memory (2 8) is a high-speed cache for temporarily storing data transferred between the load / store unit (26) and the main memory subsystem. Has been hidden. In one embodiment, the data cache memory (28) has a capacity that can store up to 64 kilobytes of data in a two-way association structure. We should understand that it can include a group of association configurations, a complete Associative configuration, various specific memory configurations of a direct mapping configuration, and any suitable capacity of any other configuration to implement data cache memory (28). The configuration of the bus interface unit (37) is configured to communicate information between the processor (10) of the computer system and other components via a bus. For example, the suan bus is compatible with the Ev-6 bus developed by Digital Equipment Corporation. In an alternative embodiment, any suitable connection 39 92365 200401194 line structure 'may be used, including a packetized unidirectional or bidirectional chain. An optional L2 cache memory interface can also be used to connect to the first-level cache memory. Note that although the embodiment shown in FIG. 1 is a superscalar embodiment, the other embodiments are not used in the pure summer embodiment. In addition, the number of functional units may vary with each embodiment. Other embodiments may use a centralized instruction reserve area instead of the individual instruction reserve areas shown in the figure. In addition, other embodiments may use a central scheduler instead of using the finger and reorder buffers shown in Figure i. The processor (10) may include a port shown as a mail box in the figure (a port of one port), which can be queried during execution-instruction flow, such as Mnn, port port U00). You can configure the configuration of the L-phase and 00) to be coupled to the processing service processor.丨 's division of # once is handled as. It can be received via the mail box 查询 to query the processor (10) status) to seal the h ′ 7-like evil test-type lean material. Similarly, you can use: Yi Ke to indicate the processor. 〇) status: a letter box (1〇. Ship to the service processor or debug processing I: 7: Box 40 coupled to the processor-one or more single :: 〇 coupled to the decoding unit (2〇Α ～ 20 记忆 α6), & store 田田 (30), instruction cache ... and so on (24 to 24C) and other units. Can be shipped to the mail box port (1⑽ ), "Beijing Biography J 忒 received the information from file k phase port (I 00). Now refer to the second circle, the figure shows k, T Ding ^ using meters to determine the _ one to deal with crying standing evil Example of the system embodiment Fanggang + °. White spoon-shaped benefits (] 0) (in the following test γ > AJ will be) (in the text will be called the main processor (1 go to the service processor 92365 20 194 ( 14〇). In some embodiments, the M M Λ service processor (140) may be located in another Thunderbolt system, and the other computer is a one-month-percent package—Himme Mody ^ ,, coupled to For example, in the example of the host processor (10), the service processor (140) may be inserted into the host processor (10) where it is located; On a rough board, or maybe even ~ Π〇η Λ processor (140) is installed on the same circuit board as the main processing pirate (10). The main processing is as described above, the main processing is crying, (1) contains a mail box port (Ίί ) 〇, mail box port (100) can include t). Recognize, mail box input port (102) and postal production M spoke output port (104). Mail 俨 r Η 牛 L phase $ pieces h The configuration of the phase input 珲〇〇2) is set to the complementary round-out cock (152) of the service processor (140). Similarly, the configuration of 't mailbox output 崞 (1 () 4) = can complement mail to port (154). Into light. The mail to the main processor (10) of the service processor (U40) cries iMOW 2 μ, and the drought (102) receives the test data transmitted from the service processing "). The test data can be self-served ... wo) Output port 〇52) pass ^ 丨 η ^ from the service processor port 002). Same as the processor's mail box input processing cry. 0) then set the configuration of the service processor (14〇) to autonomous: two = The status data can be autonomously processed by the processor ㈤) I L-phase output port () 0 (154). Each-each port is called a 4 () round-to-port type port pair can be independent of each other Input port (transmitting information to the mail letter, the delivery system and the status data from the mail, i port (1 04), detailed identification, month, day, two items will be rotated in turn., V to know the port U 54 ) Has nothing to do with it. Load test # 料入人 ⑽) The pin will serialize test data to an output of 4 processors (] 4〇))), and the ™ chat is 92365 21 200401194 test access An IEEE i *,, of the TAP (Port) can be a boundary scan bee that complies with the tE standard 1 1 49.1. Continue to check the rough & Fortunately, other shellfish shake out (TDO) pin and the status of the data string #. πτ 七 i The Yeping parallel self-input port (154) The following will explain the additional details of loading test data from the service processor (140) and unloading the center-of-heart material. Now, please read the block diagram in Figure 3 ’, the block diagram. Crying the mountain 6 Ding out is coupled to a service-Fortunately, the main processor input port —

声'知例。在所示實施例，主處理器（1〇)的郵件信箱輸入蟑 ^ η/1Λλ 子（ιυ2)係耦合到服務處 ^140)的郵件信箱輸出埠〇52)。郵件信箱輸入和叫 ^若干輪人暫存器（叫。輸人暫存器⑴2)之組態設定成亚儲存測試資料。可自位於輸出埠(} 5 2)的輸出暫存哭 (1叫接收該測試資料。輸出暫存器（162)及輪人暫存器⑴^ 2 =各種容量。在—實施例中，輪出暫存器（162)及輸入曰仔wm)之組態設定成儲存32個資料位元及—個有效 =兀。具有不同暫存器容量以及多個暫存器（對於郵件信箱輸入埠及郵件信箱輸出埠）的其他實施例也是可行的，且可考慮採用該等實施例。為了決定主處理器（10)的狀態，必須查詢輪入暫存器 (1 ] 2)的有效位元，然後將測試資料載入輸出暫存器（ία)。如前文所述’可經由—符合IEEE 11491標準的邊界掃描 TAP將資料載入郵件信箱輸出埠（152)。在所示的實施例中，輸出暫存器（〗62)係耦合到一 TDI接腳。可經由該Tm 接腳而將測試資料串列地移入輸出暫存器（162)。除了該測 5式貝料之外，亦可經由該TD1接腳將一有效位元設定在該 92365 22 200401194 輸出暫存器。為了將測試資料傳送到輸入暫存器〇】2)，首先可能必須要輪詢該有效位元，以便確定該有效位元是被清除的。因為該位元屬於主處理器的時脈領域，所以首先必須使該位元與服務處理器的時脈領域同#。同步後的該有效位元被括取到‘出暫存裔（i 62)，且被串列地移出以供檢查。當該有效位元被串列地移出以供檢查時，可將測試資料串列地載入輸出暫存器（162)。如果在輪詢輪入暫存器（112)的有效位元期間決定該有效位元是已設定的，則可能是測試資料已被栽入郵件信箱輸入埠（102)但尚未被主處理器（1〇)所擷取的—指示。因此，當設定該有效位元時，可禁止將測試資料载:輸出暫存器（162)及接續地將該測試資料傳送到輪入暫存器 (1 12)。在擷取到該測試資料時，主處理器可自輪入暫存器 (112)清除該有效位元。輪出埠（152)回應偵測到儲存在輸入暫存器（11 2)中的該有效位元被清除了，而可開始將該測試資料傳送到輸入暫存器（U2)。在該實施例中，係將一獨立的TAP指令用來設定該有效位元D使用該指令時，可以同步方式將該資料自輸出暫存器（]62)載入輸入暫存器 (112)(請參閱第3圖中之“A”）。 °° 主處理器(10)的-處理器核心亦可輪詢輸入暫存器 (112)中之有效位元。在偵測到該有效位元的設定而指示成功地將測試資料載人輸人暫存器⑴2)時，該處理器核心可揭取該測試資料。然後可利用該測試資料來產生將要傳送 92365 ZUU4UIiy4 回服務處理器的狀態之情形。〃，下文中將進一步詳細說明其中各輸入埠與輪出埠哭/ 1 η ί 日β处上傳送及接收協定可以办*上 DD (1 〇)及服務處理器 ^與主處理輻入埠/¾出埠組合的—金下表1示出施例之協定Sound 'Knowledge. In the illustrated embodiment, the mail box input of the main processor (10) is input to the mail box output port (52) coupled to the service office (140). Mailbox input and calling ^ Several rounds of register (called. Input register ⑴2) The configuration is set to sub-storage test data. It can cry from the output buffer located at the output port (} 5 2) (1 call to receive the test data. The output buffer (162) and the round-robin buffer ⑴ ^ 2 = various capacities. In the embodiment, the round The configuration of the output register (162) and the input name (wm) is set to store 32 data bits and one valid = five. Other embodiments with different register capacities and multiple registers (for mail box input ports and mail box output ports) are also feasible, and these embodiments may be considered for use. In order to determine the state of the main processor (10), it is necessary to query the valid bits of the round-robin register (1) 2), and then load the test data into the output register (ία). As mentioned earlier, the data can be loaded into the mailbox output port (152) via a boundary scan TAP that complies with the IEEE 11491 standard. In the embodiment shown, the output register (62) is coupled to a TDI pin. The test data can be serially moved into the output register (162) via the Tm pin. In addition to the 5 type shell material, a valid bit can also be set in the 92365 22 200401194 output register via the TD1 pin. In order to transfer the test data to the input register 0] 2), it may be necessary to poll the valid bit first to determine that the valid bit is cleared. Because this bit belongs to the clock domain of the main processor, it must first be the same as the clock domain of the service processor. The valid bits after synchronization are taken to ‘out of temporary storage (i 62) and removed in series for inspection. When the significant bits are shifted out in series for inspection, the test data can be serially loaded into the output register (162). If it is determined that the valid bit is set during the polling of the valid bit of the register (112), it may be that the test data has been loaded into the mail box input port (102) but has not been entered by the host processor ( 10) Captured—indication. Therefore, when the effective bit is set, it is prohibited to load the test data: output register (162) and successively transmit the test data to the rotation register (1 12). When the test data is retrieved, the main processor may clear the valid bit from the round-trip register (112). The round port (152) responds to detecting that the valid bit stored in the input register (11 2) is cleared, and can start transmitting the test data to the input register (U2). In this embodiment, an independent TAP instruction is used to set the effective bit D. When the instruction is used, the data can be loaded from the output register (] 62) into the input register (112) in a synchronous manner. (See "A" in Figure 3). °°-The processor core of the main processor (10) may also poll the valid bits in the input register (112). When the setting of the valid bit is detected and the test data is successfully input into the temporary register (2), the processor core may retrieve the test data. This test data can then be used to generate a situation where 92365 ZUU4UIiy4 will be transmitted back to the service processor. Well, the following will explain in more detail each of the input port and the wheel out port / 1 η ー Japanese transmission and reception agreement on β can be handled * DD (1 0) and service processor ^ and the main processing port / ¾ Outbound Combination-Gold Table 1 below shows the agreement of the embodiment

—般而古，士女彡資料傳送到I接^，、先中之—傳送器（亦即—輸出埠）在將 _ 安收益之前，必須先輪詢主處理器的有效位兀°在偵測到Φ卢。。处里裔有效位元是清除的時候，可將資存放到輸出·%i 、平〇节存器’並將資料傳送到接收器的輸入暫存 ·σ 疋知入暫存器中之有效位元，而指示成功地將資料自傳送哭偟、、， °。得达到接收器。該系統中之一接收器（亦即一輸、阜）可輪°旬主處理器的有效位元’直到該有效位元被設定處理裔的有效位元之設定指示有效資料出現在輸可清除處ί里主處理器的有效位元在各貫施例中，服務處理器中之接收器可定期自其輸入暫存器移出資料，並在該移出之後隨即檢查有效位元（因為該有效位元是被移出資料的一部分），以便決定該資料是曰存為，因而可讓接收器擷取該實料。在擷取資料之後，可清除處邳± + _ ... . 92365 24 200401194 否為有效的。弟:圖是耦合到一服務處理器輪入埠的主處理器輪出埠的声' 施例之方塊圖。在所示實施例中，係將主處理器 (10)的郵件k箱輸出蟑（1〇4)輕合到服務處理器（⑽）的輸入埠（1/4)。郵件信箱輸出埠（1〇4)包含一輸出暫存 ()°亥輻出暫存為（114)之組態係設定成儲存自主處理器 n 理态核〜接收之狀態資料。可回應先前輸入的測。式貝料’而產生被接收到輸出暫存器⑴4)的狀態資料件信箱輸出埠（104)包含輸出暫存器⑴4)’ 包含輪入暫存器（164)。早（54) 二在進行將資料傳送到輸出暫存器⑴4)的任何傳送之 ::::先輪詢輸出暫存器⑴物存的有效位元，以便决疋該有效位元是否為祐、、主 ^ *的。在決定該有效位元是被 /月除的時，郵件信箱輸出軔户哭n14、扯早（1ϋ4)了將狀恐貧料存放在輪出定輸出暫存器⑴4)中之有效位元。服 TAP i- . 十得送到幸別入暫存器（164)。係在控制益的（：卿e-加狀態中執行上述的步驟。一將狀態資料及有效位元值一已資料移出_ 測試資料輪出（TD0)接腳將該貝科和出’而擷取該資料。矛欠#理哭π Α Μ有效位元被設定，則服各處理為（1 40)決定已連同有效行的（亦即有效。彳移出的資料是可可針對服務處理器及主處理器八控制將資料傳送進出郵件信箱輸入及ηι刀別询出埠。在一實施例 92365 25 200401194 中，用於服務處理器的 TAP指令是：MBOXIN、 MBOXINSETV、MBOXOUT、及 MBOXOUTCLRV。服務處理器（140)可使用MBOXIN指令來實現將測試資料自輸出埠（152)傳送到郵件信箱輸入埠（102)。服務處理器（140)可回應測試資料的傳送，而執行MBOXINSETV指令，因而設定輸入暫存器（11 2)中之有效位元，以便將出現了有效資料的訊息向主處理器（10)指示。可執行MBOXOUT指令，籲以便開始將狀態資料自主處理器（1 〇)傳送到服務處理器 (M0)。可執行MBOXOUTCLRV指令，以便清除輸出暫存器（114)中儲存的有效位元。主處理器（1 0)執行的指令可包括特定模式暫存器 (Model Specific Register ；簡稱MSR)讀取及寫入助憶符號 UC_SPREG_MBOX IN 及 UC SPREG MBOX OUT 。 — —· — UC — SPREG—MBOX_IN的一 MSR讀取可開始對郵件信箱輸入暫存器（11 2)的一存取，以便自該暫存器擷取測試資 φ料。可利用UC —SPREG—MBOX—IN的一 MSR寫入來清除有效位元。UC — SPREG —MBOX—OUT的一 MSR寫入可開始對郵件信箱輸出暫存器（Π 4)的存取，以便存放狀態資料，或設定有效位元。可利用 UC_SPREG_MBOX—OUT的-~ M S R讀取來偵測該有效位元是否被清除。請注意’與各指令相關聯的助憶符號用於一特定實施例的例示說明。具有用來描述配合輸入及輸出埠而使用的特定指令之其他助憶符號之實施例也是可行的，且可考慮採用這些實施例。 26 92365 200401194 。月左思，服務處理器（丨4〇)的輸出暫存器（丨Μ)及輸入暫存器UM)可以是TDI接腳與TD〇接腳間之·中之任何移位暫存器，該移位暫存器包含可儲存測試/狀態資料及至少;:個有效位元的—足夠數目之位元位置。亦請注意，經月〕文所述的TAP以外之其他機制載入測試資料並擷取狀態資料的服務處理器(140)之其他實施例也是可行的，且可考慮採用這些實施例。此外，在本說明中，術語服務處理益及除錯處理器可交換使用，這是因為-除錯處理器可替代一服務處理H，且可根據本文之說明而設定該除錯處理器之组態。見在„月爹閱第5A g ’圖中示出查詢—主處理器的方法實施例之流程圖。方法（)開始時係於步驟（502)中輪詢 -主處理器的輸入暫存器中之有效位元。在輪詢期間，於乂驟(5 04)中決疋有效位元是被清除還是被設定。如果有效位元被設定 '繼續輪詢。一設定的有效位元可指示主處理器的輪入暫存器包含尚未被揭取的有效資料。可如前文所述之方式經由—TDI接腳串列地载人測試資料。以平行的方式載入測試資料的其他實施例也是可行的，I可考慮採用這些實施例。在將測試資料載入輪出暫存器之後，：步驟（遍）中，服務處理器的輸出埠可將資料傳送到主處理二的輸入暫存器，並在步驟(51〇)中設定有效位元。主處理：的輸入暫存器中之有效位元的設定可將出現了有效資料I 已可準備好擷取該有效資料的訊息向處理器指卜在步驟中，主處理器可回應偵測到有效位元的設定，而擷取 92365 27 200401194 測試資料，並根播#、，— 〜測試育料而產生狀態資料。器擷取了測試資料 “貝丁才隹王處理傻’主處理器可清除郵件作箱於人造的輸入暫存器中儲/ aa + 玉丨仟乜相輛入埠碑存的有效位元。之二圖。輪出狀^料的方法實施例圖中所述之方法及第丨至4圖中所示之機制而產生社次，丨乐1主4圖甲生狀恶-貝料。方法（55〇 (552)中輪詢主處理器的輸出^始A在步私 ((554)中，與參照第 …之有效位元。在步驟有ΐ51 料之方*_，料續輪詢該有效位兀，直到法令4 士、疋遠有效位元被清除為止。—曰該有效位元被清除之後，一决疋了資料存放在其郵件"可在步驟（556)中將狀態料載入外屮斬产° 的輸出暫存器。在將狀態資抖載入该輸出暫存器之後，主 ^ ψ ^ DR , 、王。。在步驟（560)中設定該 τ別暫存益中之有效位元。服才Λ處理态可傳送資料及於— Normally, the data of scholars and son-in-laws are transmitted to the I-connector. First, the transmitter—that is, the output port—must poll the valid bit of the main processor before receiving the security. ΦLu was detected. . When the valid bit is cleared, the data can be stored in the output ·% i, level 0 register 'and the data is transferred to the receiver's input buffer · σ, which is known to the valid bit in the buffer Yuan, and the instruction successfully transmitted the data to cry ,,, °. Got to reach the receiver. One of the receivers in the system (ie, one loser, one loser) can rotate the effective bit of the main processor until the effective bit is set. The effective bit is set to indicate that valid data appears in the input and can be cleared. The effective bits of the main processor in each embodiment. In various embodiments, the receiver in the service processor can periodically remove data from its input register, and check the effective bits immediately after the removal (because the effective bits Element is part of the data being removed) in order to determine whether the data is a save operation and thus allow the receiver to retrieve the actual material. After the data is retrieved, you can clear the position ± + _ .... 92365 24 200401194 Is it valid? Brother: The figure is a block diagram of an example of a main processor wheel out port coupled to a service processor wheel in port. In the illustrated embodiment, the mail box output cock (104) of the main processor (10) is closed to the input port (1/4) of the service processor (⑽). The mail mailbox output port (104) contains an output temporary storage () ° Hellow-out temporary storage as (114). The configuration is set to store autonomous processor n physical status core ~ received status data. Responds to previously entered tests. The output data of the output register (4) is generated by using the formula ‘4’ and the mailbox output port (104) contains the output register (4) ’contains the turn-in register (164). As early as (54) Second, any transfer of data to the output register (4) :::: is first polled for valid bits in the output register to determine whether the valid bits are ,, Master ^ *. When it is determined that the effective bit is divided by / month, the e-mail box output user cries n14, and it is early (1ϋ4) that the effective bit is stored in the round output fixed output register (4). Serving TAP i-. Shide is sent to the goodbye register (164). The above steps are performed in the control state (: Qing e-Plus state). First, the state data and the valid bit value are moved out of the data. _ Test data rotation (TD0) pins Take the data. The lances # 理哭 π Α Μ effective bit is set, then the server processes (1 40) the decision has been taken together with the effective line (that is, effective.) The removed data is cocoa for the service processor and the host The processor eight controls the transmission of data into and out of the mail box input and ηι inquiries out the port. In an embodiment 92365 25 200401194, the TAP instructions for the service processor are: MBOXIN, MBOXINSETV, MBOXOUT, and MBOXOUTCLRV. Service processor (140) The MBOXIN command can be used to transmit test data from the output port (152) to the mail box input port (102). The service processor (140) can respond to the transmission of the test data and execute the MBOXINSETV command, thereby setting the input temporary The valid bits in the memory (11 2), so as to indicate to the main processor (10) that a valid data message appears. The MBOXOUT instruction can be executed to call the state data to the autonomous processor. (1 0) is transmitted to the service processor (M0). The MBOXOUTCLRV instruction can be executed to clear the valid bits stored in the output register (114). The instructions executed by the main processor (1 0) can include specific mode temporary storage (Model Specific Register; MSR for short) reads and writes the memory symbols UC_SPREG_MBOX IN and UC SPREG MBOX OUT. — — · — UC — SPREG—MBOX_IN reads an MSR to start entering the mail box input register (11 2) An access to retrieve test data from the register. An MSR write of UC — SPREG — MBOX — IN can be used to clear the valid bit. An MSR of UC — SPREG — MBOX — OUT Writing can start the access to the mail mailbox output register (Π 4), in order to store status data, or set a valid bit. You can use UC_SPREG_MBOX_OUT's-~ MSR read to detect whether the valid bit is being Clear. Please note that 'memory symbols associated with each instruction are used to illustrate a particular embodiment. Embodiments with other memo symbols that are used to describe specific instructions for use with input and output ports are also possible, And can These embodiments are considered to be adopted. 26 92365 200401194. Thinking about the month, the output register (丨 M) and input register (UM) of the service processor (丨 40) may be between the TDI pin and the TD0 pin. Any of the shift registers in the shift register, which can store test / status data and at least;: effective bits-a sufficient number of bit positions. Please also note that other embodiments of the service processor (140) that load test data and retrieve status data by mechanisms other than TAP as described in the article are also feasible, and these embodiments may be considered for adoption. In addition, in this description, the terms service processing and debugging processor are used interchangeably because-the debugging processor can replace a service processing H, and the group of the debugging processor can be set according to the description herein. state. See the flowchart of the method embodiment of the query-main processor method shown in Figure 5Ag 'in the figure. The method () starts with the polling-input processor of the main processor in step (502). The effective bit is determined. During the polling period, it is determined in step (04) whether the effective bit is cleared or set. If the effective bit is set, 'continue polling. A set effective bit can indicate The rotation register of the main processor contains valid data that has not yet been retrieved. The test data can be serially loaded via the -TDI pin as described above. Other embodiments of loading test data in parallel It is also feasible, I can consider adopting these embodiments. After loading the test data into the round-out register, in step (pass), the output port of the service processor can transfer the data to the input temporary storage of the main processing two And the effective bit is set in step (51). The main processing: The setting of the effective bit in the input register of the can process the information that the effective data appears. I is ready to retrieve the effective data. In the step, the main processor can respond Measured the effective bit setting, and retrieved 92365 27 200401194 test data, and broadcasted # ,, — ~ test breeding material to generate status data. The device retrieved the test data "Bei Dingcai Wang processing silly 'main processing The device can clear the mail as a valid input bit stored in the artificial input register / aa + jade. Figure two. Examples of the method of rotating the material. The method described in the figure and the mechanism shown in Figures 4 to 4 generate the social order. Le 1 main 4 Figure A raw evil-shell material. The method (55 (552) polls the output of the host processor. Beginning in step ((554), with reference to the effective bit of ...) In the step, there are ΐ51 expected methods * _, and it is expected to continue polling. This effective bit is until the legal order 4 and Yuan Yuan effective bit is cleared.-That is, after the effective bit is cleared, the data is stored in its mail. "The status data can be stored in step (556). Load the output register of the external severance °. After loading the state jitter into the output register, the master ^ ψ ^ DR,, and the king. Set the τ different temporary storage benefit in step (560). The valid bits in the service state can transmit data and

Capture-DR狀態中之有效位有义位兀然後在步驟（％2)中將該資科及有效位元自輪入暫存器中 —" 5果所傳迗的有效位兀係设定的，則亦知道与眘料θ亡』夭迢„玄貝科疋有效的。回應自該輸入暫存器揭取到狀態資料，而可清除主處理器的有效位元，因而可自主處理器傳送額外的狀態資料。請務必注音，可以與參照第5Α圖所述的作業益關之方，，、、關（万式發生參照第5Β圖所述的作業。請注意，在本文的用法中，術語“測試資料”意指可輸入主處理器（10)以便接收所需回應的任何類型之資料。同樣地’術語“狀態資料，，意指可用來提供與主處理器（ι〇)的作業有關的資訊的任何類型之資料。此外，請注意，載入 92365 28 200401194 測試資料及（或）操取狀悲資料可以是回應處理器的個別查詢之作業，或者可以是按照固定時間間隔而執行的作業。輪詢有效位元及傳送資料的特定順序可不同於本文所述的順序之替代實施例也是可行的’且可考慮採用這些實施例。 ’但是我們當範圍不會因而化、修改、增增添、及改良範圍内。The valid bit in the Capture-DR state is meaningful, and then in step (% 2), the asset and the valid bit are rotated into the temporary register— " The valid bit system passed by the fruit is set. , It is also known and cautious θ die ”夭迢„ 玄贝科疋 is effective. In response to the state data retrieved from the input register, the effective bits of the main processor can be cleared, so the autonomous processor can Send additional status information. Please be sure to note the sounds, which can be related to the operations described in reference to Figure 5A, ,,, and (the type of operations occur with reference to Figure 5B. Please note that in the usage of this article The term "test data" means any type of data that can be entered into the main processor (10) in order to receive the desired response. Similarly, the term "status data" means the information that can be used to provide information to the main processor (ι〇). Any type of data related to the operation. In addition, please note that loading 92365 28 200401194 test data and / or operational information can be an operation that responds to individual inquiries from the processor, or it can be at regular intervals Performed "Alternative embodiments are possible in which the specific order of polling for valid bits and transmitting data is different from the order described herein is also feasible 'and these embodiments may be considered.' But we shall not change the scope, modify, increase , And improvements.

雖然已參照特定實施例而説明了本發明了解’該等實施例係舉例說明，j本發明的受到限制。對本文所述該等實施例的任何變添、及改良都是可行的。這些變化、修改、仍ϋ在最後申清專利範圍所詳述的本發明之 [圖式簡單說明；| 右蒼閱丽文中之詳細說明，並配合各附圖，將可g 了解本發明的其他方面，這呰附圖有：第1圖疋處理器的—實施例之方塊圖；第2圖是用來決定_處理器的狀態的—系統實施令方塊圖；第3圖是耦j合到― 埠的一實施例之方塊圖弟4圖是耗合到— 埠的一實施例之方塊圖服務處理器輸出埠的主處理器輸入服務處理輸入埠的主處理器輸出第5A圖是查詢一圖；以及主處理益的一方法實施例之流程Although the invention has been described with reference to specific embodiments, it is understood that the embodiments are illustrative and that the invention is limited. Any changes and modifications to the embodiments described herein are possible. These changes, modifications, and descriptions of the present invention that are detailed in the scope of the final patent application [Simplified description of the drawings; | You Cang read the detailed description in the beautiful text, and cooperate with the drawings, you can understand other of the present invention In terms of this, the drawings are as follows: FIG. 1 is a block diagram of an embodiment of a processor; FIG. 2 is a block diagram of a system implementation order for determining a processor state; FIG. 3 is coupled to ― A block diagram of an embodiment of the port 4 is a block diagram of an embodiment of a port Service processor output port main processor input service processing input main processor output Figure 5A is a query Figure; and the flow of a method embodiment of the main processing benefit

方法貫施例第5B圖是自主處理器輸出狀態資料的一之流程圖。 92365 29 200401194 10 處理器 12 預先提取/預先) 14 分支預測單元 18 指令對準單元 22,22A .至22C指令保留 26 載入/儲存單元 30 暫存器檔 34 MROM單元 38 結果匯流排 102 郵件信箱輸入淳Method Implementation Example Figure 5B is a flowchart of the state data output by the autonomous processor. 92365 29 200401194 10 processor 12 pre-fetch / pre-) 14 branch prediction unit 18 instruction alignment unit 22, 22A. To 22C instruction reservation 26 load / store unit 30 scratchpad file 34 MROM unit 38 result bus 102 mail box Enter Chun

112,164 輸入暫存器 140 服務處理器 154 互補輸入埠 τισ — I早兀 16 指令快取記憶體 20,20Α至20C解碼單元 24,24Α至24C功能單元 28 資料快取記憶體 3 2 重新排序緩衝區 37 匯流排介面單元 10 0 郵件信箱埠 104 郵件信箱輸出埠 114,162 輸出暫存器 152 互補輪出埠 • 虽隹然本發明易於作出各種修改及替代形式，但是該等圖式中係以舉例方式示出.本發明的一些特定實施例，且已在本文中詳細說明了這些特定實施例。然而’我們當了解，該等圖式及本文對該等圖式的說明之用意並非將本發明限在所揭示的特定形式，相反地，本發明將涵蓋最後的申请專利範圍所界定的本發明的精神及範圍内之所有修改' 等效物、及替代。 92365 30112,164 Input register 140 Service processor 154 Complementary input port τισ — I early 16 instruction cache 20, 20A to 20C decoding unit 24, 24A to 24C functional unit 28 data cache memory 3 2 reorder buffer 37 Bus interface unit 10 0 Mailbox port 104 Mailbox output port 114,162 Output register 152 Complementary wheel output port • Although it is easy to make various modifications and alternative forms of the present invention, these figures are shown by way of example Some specific embodiments of the present invention have been described and have been described in detail herein. However, 'we should understand that these drawings and the description of these drawings are not intended to limit the present invention to the specific forms disclosed. Instead, the present invention will cover the present invention as defined by the scope of the final patent application. All modifications within the spirit and scope of the 'equivalents, and substitutions. 92365 30

Claims

200401194 拾、申請專利範圍： 1 · 一種判斷主處理器的狀態之方法，該方法包含下列步驟：輪詢第一輸入暫存器中之第一有效位元，直到該第一有效位元被清除為止，其中該第一輸入暫存器係位於該主處理器中；將測試資料載入第一輸出暫存器，該第一輸出暫存器係位於一服務處理器中；將該測試資料自該第一輸出暫存器傳送到該第一輸入暫存器；於完成該傳送時，設定該第一有效位元；以及回應偵測到該第一有效位元的設定，而自該第一輸入暫存器擷取該測試資料；其中該載入步驟、該輪詢步驟、該傳送步驟、該設定步驟、及該擷取步驟並不會中斷在該主處理器中執行的指令流。 2.如申請專利範圍第1項之方法，進一步包含下列步驟：根據該測試資料而決定主處理器狀態；以及將狀態資料輸出到該服務處理器，該狀態資料指示了主處理器狀態，其中係以與主處理器時脈不同步之方式執行該決定步驟及該輸出步驟。 3 .如申請專利範圍第2項之方法，其中該輸出步驟包含下列步驟：輪詢第二輸出暫存器中之第二有效位元，直到該第 92365 200401194 二有效位元是被清除的為止，其中該第二輸出暫存器係位於該主處理器；將狀態資料載入該第二輸出暫存器；將該狀態資料自該第二輸出暫存器傳送到第二輸入暫存器，該第二輸入暫存器係位於該服務處理器；以及自該第二輸入暫存器輸出該狀態資料。 4. 如申請專利範圍第1項之方法，其中係回應該擷取該測試資料而清除該第一有效位元。 5. 如申請專利範圍第3項之方法，其中係在載入該第二輸入暫存器之後，設定該第二有效位元。 6. 如申請專利範圍第3項之方法，其中該輸出步驟包含下列步驟：經由一測試資料輸出（TDO)接腳而自該第二輸入暫存器捕獲並串列地移出該狀態資料。 7. 如申請專利範圍第6項之方法，進一步包含下列步驟：回應完成該移出該狀態資料而清除該第二有效位元。 8. 如申請專利範圍第3項之方法，其中每一該第一及第二輸出暫存器以及每一該第一及第二輸入暫存器的長度都是3 3位元，且其中每一該第一及第二輸出暫存器以及第一及第二輸入暫存器之組態都係設定成儲存32 個資料位元及一個有效位元。 9. 如申請專利範圍第3項之方法，進一步包含下列步驟：在將該狀態資料自該第二輸出暫存器傳送到該第二輸入暫存器之前，先使該第二有效位元與一測試時脈 92365 200401194 (TCK)領域同步。 10.如申請專利範圍第3項之方法，其中係回應一個別的查詢而執行該輸出步.驟。 1 1.如申請專利範圍第3項之方法，其中係按照固定的時間間隔而執行該輸出步驟。 1 2.如申請專利範圍第1項之方法，其中該載入步驟包含下列步驟：經由一測試資料輸入（TDI)接腳而將該測試資料串列地移入該第一輸出暫存器。 1 3.如申請專利範圍第1項之方法，進一步包含下列步驟：回應該擷取而清除該第一有效位元。 1 4.如申請專利範圍第1項之方法，進一步包含下列步驟：在將該測試資料自該第一輸出暫存器傳送到該第一輸入暫存器之前，先使該第一有效位元與一主處理器時脈領域同步。 1 5 . —種判斷主處理器狀態之系統，包含：一主處理器及一服務處理器，其中該主處理器及該服務處理器分別包含第一輸入暫存器及第一輸出暫存器，且其中該主處理器及服務處理器進一步分別包含第二輸出暫存器及第二輸入暫存器；其中該服務處理器之組態設定成決定該主處理器的狀態，而該決定包含下列步驟：輪詢該第一輸入暫存器中之第一有效位元，直到該第一有效位元是清除的為止；將測試資料載入該第一輸出暫存器； 33 92365 200401194 將該測試資料自該第一輸出暫存器傳送到該第一輸入暫存器；於完成該傳送時，設定該第一有效位元；其中該主處理器之組態係設定成回應該第一有效位元的設定而自該第一輸入暫存器擷取該測試資料；以及其中該載入步驟、該輪詢步驟、該傳送步驟、該設 ® 定步驟、及該擷取步驟並不會中斷在該主處理器中執行的指令流。 1 6.如申請專利範圍第1 5項之系統，其中該主處理器之組態進一步孫設定成執行下列步驟：根據該測試資料而決定一主處理器狀態；以及將狀態資料輸出到該服務處理器，該狀態資料指示了一主處理器狀態，其中係以與主處理器時脈不同步之方式執行該決定步驟及該輸出步驟。籲1 7 .如申請專利範圍第1 6項之系統，其中該主處理器之組態進一步係設定成執行下列步驟：輪詢該第二輸出暫存器中之第二有效位元，直到該第二有效位元被清除為止；將狀恶資料存放在該弟二輸出暫存器；將該狀態資料自該第二輸出暫存器傳送到第二輸入暫存器；且其中該服務處理器之組態設定成輸出由該第二輸入暫存器所接收的狀態資料。 34 92365 200401194 25. 如申請專利範圍第17項之系統，其中該主處理器之組恶係設定成按照固定的時間間隔而將狀態貢料輸出到該服務處理器。 26. 如申請專利範圍第1 5項之系統，其中該服務處理器之組態係設定成經由一測試資料輪入（TDI)接腳而將資料串列地移入該第一輸出暫存器。 27. 如申請專利範圍第1 5項之系統，其中該主處理器之組態係設定成回應自該第一輸入暫存器擷取該測試資料而清除該第一有效位元。 28. 如申請專利範圍第15項之系統，其中該系統之組態係設定成：在將該測試資料自該第一輸出暫存器傳送到該第一輸入暫存器之前，先使該第一有效位元與一主處理器時脈領域同步。 29. —種主處理器，包含：一輸入埠，該輸入埠之組態設定成耦合到一服務處理器的輸出暫存器，該輸入埠包含一暫存器，用以儲存其中包括第一有效位元的複數個位元；以及一輸出埠，該輸出埠之組態設定成耦合到一服務處理器的輸入暫存器；其中該輸入淳之組態係設定成回應該服務處理器 ί貞測到該第一有效位元是被清除的而自該輸出暫存器接收測試資料，且其中係回應接收到該測試資料而設定該第一有效位元；其中該輸出埠之组態設定成回應該主處理器偵測 36 92365 200401194 1 8.如申請專利範圍第1 7項之系統，其中該主處理器之組態設定成回應擷取該測試資料而清除該第一有效位元。 19.如申請專利範圍第17項之系統，其中該主處理器之組態設定成回應將狀態資料存放在該第二輸出暫存器而設定該第二有效位元。 2 0.如申請專利範圍第17項之系統，其中該服務處理器之組態設定成經由一測試資料輸出（TDO)接腳而自該第二輸入暫存器串列地移出該狀態資料。 2 1.如申請專利範圍第20項之系統，其中該服務處理器之組態設定成回應自該第二輸入暫存器串列地移出該狀態資料而清除該第二有效位元。 22.如申請專利範圍第1 7項之系統，其中每一該第一及第二輸出暫存器以及每一該第一及第二輪入暫存器的長度都是33位元，且其中每一該第一及第二輸出暫存器以及第一及第二輸入暫存器之組態都被設定成儲存3 2 個資料位元及一個有效位元。 23 .如申請專利範圍第1 7項之系統，其中該系統之組態係設定成：在將該狀態資料自該第二輸出暫存器傳送到該第二輸入暫存器之前，先使該第二有效位元與一測試時脈（TCK)領域同步。 24 _如申請專利範圍第]7項之系統，其中該主處理器之組態係設定成回應來自該服務處理器的一查詢而輸出狀態資料。 35 92365 200401194 到第二有效位元是被清除的而將狀態資料傳送到該輪入暫存器，該狀態資料指示該主處理器的狀態，且係回應該主處理器接收到該測試f料而產生該狀態資料，其中6玄主處理器之纽態設定成回應該輪出暫存該狀態資料而設定該第二有效位元；以及 * 其中係在不會中斷在該主處理器中執行的於人去之情形下’執行將測試資料傳送到該主處理心態資料、及將狀態資料傳送到該輪入暫存器1 生狀 30.如申請專利範圍第29項之主處理器，° ,^ 了 5发芏處if哭之、、且恶設定成在傳送該狀態資料之前先輪詢該第二: 效位兀，其中該第二有效位元係儲存在該輪出埠 31•如申請專利範圍第29項之主處理器，其中。之組態設定成：自該輸入埠擷取該測試資料，。。擷取到該測試資料而清除該第一回應 -口:請專利範圍第則之主處理器，其中該輪入璋之組恶5又疋成自該服務處理器接收32位元的測該第一有效位元。 33·如申請專利範圍第29項之主處理器，其中該輪出埠之組態設定成將32位元的狀態資料及該第二有效位元僂送到該服務處理器。 34=申請專利範圍第29項之主處理器，其中在自該服務處理益的該輪出暫存器接收測試資料之前，先使該第一有效位元與4主處理器的一時脈領域同步。 35.如申請專利範圍第29項之主處理器，其中在將該狀態 92365 37 200401194 資料傳送到該輸入暫存器之前，先使該第二有效位該服務處理器中之一測試時脈（TCK)領域同步。 36.如申請專利範圍第29項之主處理器，其中該主處之組態設定成回應來自該服務處理器的一查詢而態資料傳送到該服務處理器。 3 7.如申請專利範圍第29項之主處理器，其中該主處之組態設定成按照固定的時間間隔將狀態資料傳該服務處理器。元與理器將狀理器送到200401194 Patent application scope: 1 · A method for judging the state of the main processor, the method includes the following steps: poll the first significant bit in the first input register until the first significant bit is cleared So far, the first input register is located in the main processor; the test data is loaded into the first output register, and the first output register is located in a service processor; The first output register is transmitted to the first input register; when the transfer is completed, the first valid bit is set; and in response to detecting the setting of the first valid bit, the first valid bit is detected from the first The input register retrieves the test data; the loading step, the polling step, the transmitting step, the setting step, and the fetching step do not interrupt the instruction flow executed in the main processor. 2. The method according to item 1 of the patent application scope, further comprising the following steps: determining the state of the main processor according to the test data; and outputting the state data to the service processor, the state data indicating the state of the main processor, wherein The determination step and the output step are performed in a manner that is not synchronized with the clock of the main processor. 3. The method according to item 2 of the patent application scope, wherein the output step includes the following steps: Polling the second significant bit in the second output register until the 92365 200401194 second significant bit is cleared Wherein the second output register is located in the main processor; loading state data into the second output register; transmitting the state data from the second output register to the second input register, The second input register is located in the service processor; and the status data is output from the second input register. 4. For the method in the first scope of the patent application, wherein the test data should be retrieved to clear the first significant bit. 5. The method according to item 3 of the patent application scope, wherein the second valid bit is set after loading the second input register. 6. The method according to item 3 of the patent application, wherein the output step includes the following steps: capture and parallelly remove the status data from the second input register via a test data output (TDO) pin. 7. If the method of claim 6 is applied, the method further includes the following steps: The second valid bit is cleared in response to completing the removal of the status data. 8. If the method of claim 3 is applied, the length of each of the first and second output registers and each of the first and second input registers is 33 bits, and each of them A configuration of the first and second output registers and the first and second input registers are set to store 32 data bits and a valid bit. 9. If the method of claim 3 is applied, the method further includes the following steps: before transmitting the status data from the second output register to the second input register, the second valid bit and One test clock 92365 200401194 (TCK) field synchronization. 10. The method according to item 3 of the scope of patent application, wherein the output step is performed in response to another query. 1 1. The method according to item 3 of the scope of patent application, wherein the outputting step is performed at a fixed time interval. 1 2. The method according to item 1 of the patent application range, wherein the loading step includes the following steps: moving the test data into the first output register in series via a test data input (TDI) pin. 1 3. The method according to item 1 of the scope of patent application, further comprising the following steps: responding to the capture and clearing the first significant bit. 14. The method according to item 1 of the patent application scope, further comprising the following steps: before transmitting the test data from the first output register to the first input register, the first valid bit Synchronized with a main processor clock domain. 15. A system for determining the status of a main processor, comprising: a main processor and a service processor, wherein the main processor and the service processor include a first input register and a first output register, respectively And the main processor and the service processor further include a second output register and a second input register, respectively; wherein the configuration of the service processor is set to determine the state of the main processor, and the decision includes The following steps: poll the first significant bit in the first input register until the first significant bit is cleared; load test data into the first output register; 33 92365 200401194 The test data is transferred from the first output register to the first input register; when the transfer is completed, the first valid bit is set; wherein the configuration of the main processor is set to respond to the first valid bit Bit setting to retrieve the test data from the first input register; and wherein the loading step, the polling step, the transmitting step, the setting® setting step, and the fetching step are not performed The instruction flow executing in this host processor is interrupted. 16. The system according to item 15 of the scope of patent application, wherein the configuration of the main processor is further configured to perform the following steps: determining a main processor state according to the test data; and outputting state data to the service For the processor, the status data indicates a state of the main processor, in which the determining step and the output step are performed in a manner that is not synchronized with the clock of the main processor. 17. The system according to item 16 of the patent application scope, wherein the configuration of the main processor is further configured to perform the following steps: poll the second significant bit in the second output register until the Until the second effective bit is cleared; storing the evil data in the second output register; transmitting the status data from the second output register to the second input register; and wherein the service processor The configuration is configured to output status data received by the second input register. 34 92365 200401194 25. If the system of claim 17 is applied for, the group of the main processor is set to output status data to the service processor at a fixed time interval. 26. The system of item 15 in the scope of patent application, wherein the configuration of the service processor is set to move data into the first output register in series via a test data rotation (TDI) pin. 27. If the system of claim 15 is applied, the configuration of the main processor is set to clear the first valid bit in response to retrieving the test data from the first input register. 28. If the system of claim 15 is applied, the configuration of the system is set to: before transmitting the test data from the first output register to the first input register, A valid bit is synchronized with a main processor clock domain. 29. A host processor comprising: an input port configured to be coupled to an output register of a service processor, the input port including a register for storing therein a first register A plurality of effective bits; and an output port configured to be coupled to an input register of a service processor; wherein the input configuration is set to respond to the service processor It is detected that the first valid bit is cleared and test data is received from the output register, and the first valid bit is set in response to receiving the test data; wherein the configuration of the output port is set to In response to the detection of the main processor 36 92365 200401194 1 8. The system of item 17 in the scope of patent application, wherein the configuration of the main processor is configured to clear the first significant bit in response to retrieving the test data. 19. The system of claim 17 in the scope of patent application, wherein the configuration of the main processor is set to set the second valid bit in response to storing status data in the second output register. 20. The system according to item 17 of the patent application scope, wherein the configuration of the service processor is configured to serially remove the status data from the second input register via a test data output (TDO) pin. 2 1. The system of claim 20, wherein the configuration of the service processor is configured to clear the second valid bit in response to serially removing the status data from the second input register. 22. The system of claim 17 in the scope of patent application, wherein the length of each of the first and second output registers and each of the first and second round registers is 33 bits, and wherein Each of the first and second output registers and the first and second input registers are configured to store 32 data bits and one valid bit. 23. The system according to item 17 of the scope of patent application, wherein the configuration of the system is set to: before transmitting the status data from the second output register to the second input register, The second significant bit is synchronized with a test clock (TCK) field. 24 _ The system according to item 7 of the patent application scope, wherein the configuration of the main processor is set to output status data in response to a query from the service processor. 35 92365 200401194 The second valid bit is cleared and the status data is transmitted to the round-robin register. The status data indicates the status of the main processor, and it is in response to the main processor receiving the test data. The state data is generated, in which the state of the 6 main processor is set to return to temporarily store the state data to set the second effective bit; and * where the execution in the main processor is not interrupted In the case of a person's going, 'execute the test data to the main processing mentality data, and the status data to the turn-in register 1 State 30. If the main processor of the 29th scope of the patent application, ° , ^ The 5th place where if crying, and evil is set to poll the second: the effective bit first, before the status data is transmitted, where the second effective bit is stored in the port 31. The main processor of the scope of patent application No. 29, of which. The configuration is set to: retrieve the test data from the input port,. . Retrieve the test data and clear the first response-port: Please request the main processor of the patent scope, where the round-robin set of evil 5 is again received as a 32-bit test from the service processor. One significant bit. 33. If the main processor of item 29 of the patent application scope, the configuration of the round port is set to send the 32-bit status data and the second effective bit 偻 to the service processor. 34 = The main processor of the 29th scope of the patent application, before the test data is received from the round-out register of the service processing benefit, the first valid bit is synchronized with the clock domain of the 4 main processor . 35. If the main processor of the scope of patent application No. 29, before the data of the state 92365 37 200401194 is transmitted to the input register, one of the service processors of the second valid bit is used to test the clock ( TCK) field synchronization. 36. The main processor of claim 29, wherein the configuration of the main office is configured to transmit data to the service processor in response to a query from the service processor. 37. The main processor of item 29 in the scope of patent application, wherein the configuration of the main office is set to transmit status data to the service processor at fixed time intervals. Element and processor to send the processor to

9236592365