TW200809514A - Stalling of DMA operations in order to do memory migration using a migration in progress bit in the translation control entry mechanism - Google Patents

Stalling of DMA operations in order to do memory migration using a migration in progress bit in the translation control entry mechanism Download PDF

Info

Publication number
TW200809514A
TW200809514A TW096111878A TW96111878A TW200809514A TW 200809514 A TW200809514 A TW 200809514A TW 096111878 A TW096111878 A TW 096111878A TW 96111878 A TW96111878 A TW 96111878A TW 200809514 A TW200809514 A TW 200809514A
Authority
TW
Taiwan
Prior art keywords
migration
conversion control
input
page
direct memory
Prior art date
Application number
TW096111878A
Other languages
Chinese (zh)
Other versions
TWI414943B (en
Inventor
Carl Alfred Bender
Patrick Allen Buckland
Steven Mark Thurber
Adalberto Guillermo Yanes
Original Assignee
Ibm
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ibm filed Critical Ibm
Publication of TW200809514A publication Critical patent/TW200809514A/en
Application granted granted Critical
Publication of TWI414943B publication Critical patent/TWI414943B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1081Address translation for peripheral access to main memory, e.g. direct memory access [DMA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Bus Control (AREA)

Abstract

A mechanism for temporarily stalling selected Direct Memory Access (DMA) operations in a physical I/O adapter in order to permit migration of data between physical pages that are subject to access by the physical I/O adapter. When a request for a DMA to a physical page in system memory is received from an input/output adapter, a migration in progress (MIP) bit in a translation control entry (TCE) pointing to the physical page is examined, wherein the MIP bit indicates whether migration of the physical page referenced in the TCE to another location in system memory is currently in progress. If the MIP bit indicates a migration of the physical page is in progress, the DMA from the input/output adapter is temporarily stalled while other DMA operations from other input/output adapters to other physical pages in system memory are allowed to continue.

Description

200809514 九、發明說明: 【發明所屬之技術領域】 本發明係關於資料處理系統,尤其關於在受到輸入/ 輸出(I/O)裝置所存取的實體頁之間資料的遷移。更特別 地,本發明係關於暫時延遲在實體1/0配接器中的選擇性 直接記憶體存取(DMA),以便允許在受到實體1/〇配接 器存取的實體頁之間資料的遷移。 【先前技術】 電腦系統可被重新組態同時運行,而沒有干擾到該系 統所處理的資料。例如,以在電腦上運行的複數個作業系 統’其中-個作業系統則可使用特定的記憶體方塊,並且 必須重新分配那記憶體方塊,以供第二作業系統使用。因 此’第-作業系統必須停止使用該方塊的實體記憶體,以 允許第二作業系統存取。如另一實例,可在一方塊的實體 記憶體愤查出問題,在該情形中,從操作移除記憶體是 令人希望的,以致於它可被取代。因此,在特定實體頁内 的資料則必須被移動,或者實體頁的使關必須被阻擒一 段時間。假如記憶體方塊受到1/0裝置存取的話,那麼遷 移或阻擋實體頁使用的問題則會變得困難。 用來遷移受到I/O裝置存取之資料的—種方法,係為 暫時但卻完整地使由結合!/〇裝置之1/〇配接輯進行的 所有仲裁失能,以便存取特定的實體頁。仲裁流程係為在 200809514 DMA流程内所採用的第一步驟。當仲裁失能時,實體⑷ 配接器的所有DMA操作都會失能。因此,短期而言,需 要存取欲鶴移龍之實體I/Q轉器的所有^A ^ == 中裁失能時完全失能。在短期内,實體頁資 枓—遷移或更新。—旦實體頁的遷移或更新完成的 話,DMA可再一次地在1/〇配接器中操作。 不過,雖然既存配接器,譬如工業標準的週邊組件互 連(PCI) Express匯流排,允許暫時失能或延遲dma操 以進行資料遷移,但是這些配接器則要求所有dma200809514 IX. Description of the Invention: TECHNICAL FIELD OF THE INVENTION The present invention relates to data processing systems, and more particularly to the migration of data between physical pages accessed by input/output (I/O) devices. More particularly, the present invention relates to temporarily delaying selective direct memory access (DMA) in a physical 1/0 adapter to allow data between physical pages accessed by an entity 1/〇 adapter. Migration. [Prior Art] The computer system can be reconfigured to run simultaneously without interfering with the data processed by the system. For example, a plurality of operating systems running on a computer may use a particular memory block, and the memory block must be reassigned for use by the second operating system. Therefore, the 'first-operation system must stop using the physical memory of the block to allow access by the second operating system. As another example, the problem can be detected in a block of physical memory, in which case it is desirable to remove the memory from the operation so that it can be replaced. Therefore, the material in a particular physical page must be moved, or the physical page must be blocked for a period of time. If the memory block is accessed by the 1/0 device, then the problem of migrating or blocking the use of the physical page becomes difficult. The method used to migrate the data accessed by the I/O device is temporarily but completely combined! All arbitration failures performed by the 1/〇 device of the device are used to access a specific physical page. The arbitration process is the first step taken in the 200809514 DMA process. When the arbitration is disabled, all DMA operations of the entity (4) adapter are disabled. Therefore, in the short term, it is necessary to access all of the ^A ^ == of the physical I/Q converter of the crane to move completely. In the short term, physical page information – migration or update. Once the migration or update of the physical page is complete, the DMA can again operate in the 1/〇 adapter. However, while existing adapters, such as industry standard Peripheral Component Interconnect (PCI) Express busses, allow temporary disabling or delay dma operations for data migration, these adapters require all dma

操作暫時。财松的缺點是,在S騎上所有DMA #作的失能,齡不利地影響來自/對於1/〇配接器的其他 處理中(m-flight) DMA轉移,並且致使實體1/〇配接器進 入錯誤狀態。 口因此,具有於延遲實體I/O配接器中僅僅選出DMA 操作的機制則是有利的,以便允許實體頁之遷移,該些實 體頁會文到實體I/O配接器之存取,同時允許從實體1/〇 配接為到系統記憶體中的其他頁之其他DMA操作得以繼 【發明内容】 ^本發明貫施例提供一種電腦實施的方法與資料處理 系統’以暫時性地延遲在實體1/〇配接器中選出的直接記 200809514 憶體 存取1鱗錢财體1/0配接器 配接器之讓為使來自/對於1/0 劁,杳ό 心置次失犯的流程。由於本發明的機 時資料會在料實H器的職操作會被暫時延遲’同 I/O、配接哭$ΪΓΓ 間遷移’同時來自/對於其他實體 1系心己憶體中之其他頁的其他DMA操作會 7允轴績。t由輸入/輪丨配接器 =制項中職之實抑财統記 正在職。假如遷移進展位元指出倾頁的遷移正 ί ΐ J暫時地延遲來自輪入/輸出配接器的直接記憶體 二’同時允許繼續從其他輸入/輪&配接器到系統記憶體 之其他貝體頁的其他直接記憶體存取操作。 【實施方式】 ^在參考圖式’ g 1描繪出實施本發明實施例之資料 處理系統的方塊@。資料處理魏應係為—對稱性多重 處理器(SMP)系、统,其包括連接到系統匯流排1〇6的複 數個處理器im、102、103與104。例如,資料處理系統 100 是 IBM⑧ eServer™,其為在 New York,Armonk 萬國 商業機器公司(International Business Machines C^orati⑽ 的產品,實施為網路内的伺服器。或者,可實施為單一處 理器系統。同樣連接到系統匯流排1〇6的是記憶體控制器 200809514 /快取108,其提供一介面給複數個本地記憶體16〇_163。 I/O匯流排橋接器110連接到系統匯流排1〇6,並且提供一 介面到I/O匯流排112。記憶體控制器/快取1〇8與1/〇匯 流排橋接器110可如圖所示整合。 資料處理系統100為邏輯分割(LPAR)資料處理系 統;不過,應理解的是,本發明不限於LPAR系統,亦可 實轭於其他資料處理系統中。LpAR資料處理系統1〇〇具 有同時執行的多重異種作業系、统(或者單一作業系統的多 重副本)。這些多重作業系統的每一個皆具有執行於内之 任何數目的軟體程式。資料處理系統1〇〇係邏輯分割,以 致於不同的pci輸入/輸出配接器(I0A) 120、12卜122、 123與124、圖形配接器148與硬碟配接器149或其部件, 可被指派到不同的邏輯分割部分。在此情形中,圖形配接 器148提供一連接給顯示裝置(未顯示),同時硬碟配接 器149則提供一連接以控制硬碟15〇。 因此,舉例而言,假定資料處理系統1〇〇被分為三個 邏輯分割部分PI、P2與P3。每個PCI輸入/輸出配接器 120-124、圖形配接器148、硬碟配接器149、每個主機處 理器101-104、以及來自本地記憶體16〇_163的記憶體則 會被指派到三個分割部份的每一個。在本實施例中,記憶 體160-163可採用雙直列記憶體模組(DIMM)的形式。 DIMM正常下並不會以每一個DIMM為基礎被指派到分 200809514 割部分。反之,一分割部分將會得到由平台所看到之整個 記憶體的一部分。例如,處理器1〇1、來自本地記憶體 160-163的某部分記憶體、以及PCI輪入/輸出配接器 123與124則可被指派到邏輯分割部分P1 ;處理器 102-103、來自本地記憶體160-163的某部分記憶體、以: PCI輸入/輸出配接器120與122則可被指派到邏輯分割部 分P2 ;以及處理器104、來自本地記憶體16(M63的某部 分記憶體,圖形配接器148與硬碟配接器149則可被指派 到邏輯分割部分P3。 在邏輯分割資料處理系統100内執行的每一作業系統 會被指派到不同邏輯分割部分。因此,在資料處理系統^ 〇〇 内執行的每一作業系統可僅僅存取那些在其邏輯分割部 分内的輸入/輸出配接器。例如,高階交互執行(Αΐχ⑧) 作業系統的一實例可被執行於分割部分P1内,ΑΙχ⑧作業 系統的弟二實例(副本)則可被執行於分割部分Ρ2内, 且Linux⑧或者OS/400作業系統則可***作於邏輯分割部 分P3内。 週邊組件互連(PCI)主橋接器(PHBs) 13〇、m、 132與133會被連接到1/0匯流排112,其係並且分別提供 乃面到PCI本地匯流排140、141、142與143。PCI輸入/ ,出配接器120-121經由I/O組構18〇連接到ρα本地匯The operation is temporary. The disadvantage of Caisong is that the all-DMA failures that S rides on S, adversely affects other processing (m-flight) DMA transfers from/for the 1/〇 adapter, and causes the entity 1/〇 The connector enters an error state. Therefore, it is advantageous to have a mechanism for selecting only DMA operations in the delay entity I/O adapter to allow migration of physical pages, which are accessed by the physical I/O adapter. At the same time, other DMA operations that are allowed to be connected from the entity 1/〇 to other pages in the system memory can be continued. [The present invention provides a computer-implemented method and data processing system to temporarily delay In the entity 1 / 〇 adapter selected directly in the 200809514 memory access 1 scale money body 1 / 0 adapter adapter to make it from / for 1 / 0 劁, 杳ό heart lost The process of committing. Since the machine time data of the present invention will be temporarily delayed in the operation of the material H device, 'the same I/O, the matching crying, the migration between the two, and the other pages in the other entity 1 Other DMA operations will allow for a minimum of 7 performance. t by the input / rim adapter = the actual position of the job in the middle of the job. If the migration progress bit indicates that the migration of the page is positive, ΐJ temporarily delays the direct memory from the on/off adapter. 'Allows to continue from other input/wheels& adapters to other memory of the system memory. Other direct memory access operations for the body page. [Embodiment] ^ A block @ which implements the data processing system of the embodiment of the present invention is described with reference to the schema 'g1. The data processing Wei Ying is a Symmetric Multiple Processor (SMP) system, which includes a plurality of processors im, 102, 103 and 104 connected to the system bus 11-6. For example, data processing system 100 is IBM8 eServerTM, which is a product of International Business Machines C^orati (10) implemented as a server within the network in New York, Armenk. Alternatively, it can be implemented as a single processor system. Also connected to the system bus 1〇6 is the memory controller 200809514/cache 108, which provides an interface to a plurality of local memories 16〇_163. The I/O bus bridge 110 is connected to the system bus 1〇6, and provides an interface to the I/O bus bar 112. The memory controller/cache 1 8 and the 1/〇 bus bar bridge 110 can be integrated as shown. The data processing system 100 is logically segmented ( LPAR) data processing system; however, it should be understood that the present invention is not limited to LPAR systems and can be conjugated to other data processing systems. The LpAR data processing system 1 has multiple heterogeneous operating systems (or Multiple copies of a single operating system. Each of these multiple operating systems has any number of software programs executed. The data processing system 1 is logically partitioned. Thus, different pci input/output adapters (I0A) 120, 12b 122, 123 and 124, graphics adapter 148 and hard disk adapter 149 or components thereof can be assigned to different logically segmented portions. In this case, graphics adapter 148 provides a connection to a display device (not shown), while hard disk adapter 149 provides a connection to control hard disk 15 因此. Thus, for example, assume data processing system 1 The 〇〇 is divided into three logical divisions PI, P2 and P3. Each PCI input/output adapter 120-124, graphics adapter 148, hard disk adapter 149, and each host processor 101-104 And the memory from the local memory 16〇_163 is assigned to each of the three divided portions. In this embodiment, the memory 160-163 can be a double in-line memory module (DIMM). The DIMM is normally not assigned to the 200809514 cut portion on a per DIMM basis. Conversely, a split portion will get a portion of the entire memory seen by the platform. For example, the processor 〇1. Some memory from local memory 160-163, And PCI wheel in/out adapters 123 and 124 can be assigned to logical partitioning portion P1; processors 102-103, some portions of memory from local memory 160-163, to: PCI input/output adapter 120 and 122 may be assigned to logical partitioning portion P2; and processor 104, from local memory 16 (some portion of memory of M63, graphics adapter 148 and hard disk adapter 149 may be assigned to logical partitioning) Part P3. Each operating system executing within the logically segmented data processing system 100 is assigned to a different logically segmented portion. Thus, each operating system executing within the data processing system can access only those input/output adapters within its logically partitioned portion. For example, a high-level interactive execution (Αΐχ8) an instance of the operating system can be executed in the splitting portion P1, and a second instance (copy) of the 作业8 operating system can be executed in the splitting portion Ρ2, and the Linux8 or OS/400 operating system Then it can be operated in the logical division portion P3. Peripheral Component Interconnect (PCI) Master Bridges (PHBs) 13〇, m, 132, and 133 are connected to the 1/0 busbars 112, which are respectively provided to the PCI local busbars 140, 141, 142 and 143. The PCI input/output adapters 120-121 are connected to the ρα local sink via the I/O fabric 18〇.

流排14〇,其包含切換器與橋接器。以類似的方式,PCI 11 200809514 輸入/輸出配接器122經由I/O組構(fabric) 181連接到Ρα 本地匯流排141,PCI輸入/輸出配接器123與124經由1/〇 、、且構182連制PCI本地匯流排142,且圖形配接器148 與硬碟配接H 149經由i/Q組構183連制ρα本地匯流 排143。1/〇組構180-183提供介面到ρα匯流排14〇-143。 典型的PCI主橋接器將支援四與八個輸入/輸出配接器之 間(例如,用於内插連接器的擴充槽)。每一 ρα輸入/輸 V 1:1:5配接11 12G_124提供—介面於資料處理系統1GG以及輸 入/輸出裝置之間,譬如例如其他網路電腦,其係為到資料 處理系統100的客戶端。 PCI主橋接器130提供一介面給pci匯流排140,以 連接到I/O匯流排112。PCI匯流排140亦同樣地將pci 主橋接器130連接到服務處理器信箱介面,以及ISA匯流 排存取通過邏輯194與I/O組構18〇。服務處理器信箱介 面與ISA匯流排存取通過邏輯194,轉寄目的為pci/isa 、' 橋接裔193之PCI存取。非易失性隨機儲存記憶體 (NVRAM)儲存192係連接到ISA匯流排196。服務處理器 135經由其本地PCI匯流排195而麵合到服務處理器信箱 介面與ISA匯流排存取通過邏輯194。服務處理器135亦 同樣地經由极數個JTAG/IC匯流排134而連接到處理哭 101 -104。JTAG/I C匯流排134係為JTAG/掃瞒匯流排(見 IEEE 1149.1)與Phillips I2C匯流排的組合。不過,或者, JTAG/I2C匯流排134可僅僅藉由Phillips I2C匯流排或者 12 200809514 JTAG/掃猫匯流排來取代。主機處理器101、102、1〇3與 ⑽的所有SP_ATTN信號會被一起連接成服務處理哭的/中 斷輸入信號。服務處理H出具有其本身的本地記憶體 191,並且已經存取到硬體0P_面板19〇。 當資料處理系統100最初被啟動時,服務處理器135 則使用JTAG/I2C匯流排134,以訊問系統(主機)處理器 =1-104、記憶體控制器/快取108以及1/0橋接器11〇。在 完成此步驟時,服務處理器135具有對資料處理系統100 的目錄與拓樸理解。服務處理器135亦對藉由詢問主機處 理益101-HH、記憶體控制器/快取1〇8與⑽橋接器⑽ 而發現的财元件,執行峨式自_試(BISTs)、基本 保證測試(BATs)卩及記紐峨。在msTs、baTs與 記憶體測試_檢測出失敗的任何錯誤資訊會被聚集^ 且由服務處理器135所報導。 ^ 假如在取出於BISTs、BATs與記龍職 錯誤的元件後,系㈣源之有意猶效_仍財能的 活’貧料處理系統會被允許開始將可執行密碼載入 到本地(主機)έ己憶體160_163。服務處理器出隨後會 釋放主機處理H 以用來執行載人於本地記^ „3⑽密碼。主機處理器正執行來自資料 處理糸統KKm之個別作業系統之密碼的同時,服務處理 器m會進入監視與報導錯誤模式。藉由服務處理哭出 13 200809514 視的項目型態例如包括冷卻闕速度絲作、熱感應 器、電源供應調節器以及由處理器101_104、本地記憶體 160-163以及1/0橋接器11〇所報導的可復原與不可復^錯 誤。 服務處理器135負責節省並且報導與在資料處理系統 100中之所有監視項目有關的錯誤資訊。服務處理器135 亦依據錯誤與所定義臨限來採取行動。例如,服務處理器 U5會注意到在處理器之快取記憶體上的過多可復原錯 ,,並且決定這可預言為硬失敗(hard failure)。依據此種決 疋服務處理器135可標記出在現有執行期間與未來程式 初、載(IPLs)内退出組態的來源。IpLs有時亦同樣地稱為 、、啟動(boot)〃或、、啟動程式(bootstrap)"。 ^資料處理系統10〇可使用種種商業上可得的電腦系統 來實施。例如,資料處理系統1〇〇可使用從萬國商業機器 ^司知到的IBM⑧eServerTMiSeriesTM模型840系統來實 ,。此一系統可支援使用〇s/4〇〇⑧作業系統的邏輯分割部 分,其係可從萬國商業機器公司得到。 ^ 叙熟此技藝者將理解到,圖1中所描述的硬體可改 支。例如,其他週邊裝置,譬如光碟驅動與類似物亦可被 ,用除了或者替代所描述的硬體。所描述的實例不一定 思指必然包含關於本發明的架構限制。 14 200809514 I/O橋接器110包括具有表ll〇a以及與表相關的控 制。轉換與控制項(TCEs)會被儲存在TCE表ll〇a中。 表110a為I/O位址轉換與保護機制,其以1/〇頁為基礎, 提供控制從I/O裝置到實體頁的I/O操作能力。 TCE項結合實體記憶體之實體頁的真實位址,以及藉 由I/O配接器呈現在I/O匯流排的位址。每一項結合特定 的實體頁以及特定的I/O匯流排頁。TCE表丨1〇a藉由DMA 操作的I/O匯流排位址來編索引。此表被用來確保1/〇配 接器僅僅存取被指派的儲存位置。此外,TCE機制提供間 接的位址機制,其允許本發明實施例被實施。 圖2描述包括本發明所示實施例之示範性邏輯分割平 台的方塊圖。在邏輯分割平台2〇〇中的硬體係可實施為例 如圖1中的資料處理系統1〇〇。邏輯分割平台2〇〇包括分 割硬體230、作業系統(〇s) 202、204、206、208與平台 拿刃體210。作業系統202、204、206與208係為單一作業 系統的多重副本,或者在邏輯分割平台2〇〇上同時執行的 多重異種作業系統。這些作業系統可使用〇s/4〇〇⑧來實 施’其係設計為與分割管理韌體界面接合,譬如 Hypervisor。OS/400⑧僅僅用於作為這些說明性實施例中的 貝例。其他種類的作業系統,譬如Αΐχ⑧與Linux⑧亦可依 據特定的實施來使用。 200809514 作業系統202、204、206與208係位於分割部分2〇3、 205、207與209。Hypervisor軟體為可用以實施平台韌體 210 ’並可從萬國商業機器公司得到的軟體實例。韋刃體係 在記憶體晶片中所儲存的、、軟體〃,其可無需電力保留其 内谷’舉例而言如唯讀記憶體(ROM )、可程式rom (PROM)、可拭除可程式R〇M (EPROM)、電性可拭除 可程式ROM (EEPROM)、以及非揮發性隨機存取記憶體 (非揮發性RAM)。 此外,這些分割部分亦同樣地包括分割部分韋刀體 211、213、215與217。分割部分韌體21卜213、215與 217可使用初始啟動程式碼、ffiEE_1275標準開啟韌體以 及執行時間摘要軟體(RTAS)來實施,其係可從萬國商 業機器公司得到。 當將分割部分203、205、207與209舉例說明時,啟 動程式碼的副本則會藉由平台韌體210被載入於分割部分 203、205、207與209。此後,控制會被傳送到啟動程式 碼’啟動程式碼隨後會載入開啟韌體與RTAS。相關或指 派到分割部分的處理器隨後會被發派到該分割部分的記 憶體,以執行分割部分韌體。 分割硬體230包括複數個處理器232-238、複數個系 16 200809514 統圮憶體單元240_246、複數墙人/細配接n (I〇A) M8-262:儲存單元270與TCE表272。處理器232_238、 記憶體單το 24〇·246、非易失性隨機儲存記憶體館存298 以及I/O配接器248-262鱗一個或其部分,可被指派到 在邏輯为副部分平台200内之多重分割部分的其中一個, 其中母一個對應作業系統202、204、206與208的其中一 個。 平台韌體210進行分割部分2〇3、2〇5、207與209的 許多功能與服務,以產生並且強迫執行邏輯分割平台2〇〇 的分割部分。平台韌體210係為實施與基礎硬體相同之虛 擬機器的動體。因此,平台動體21〇藉由使邏輯分割平台 200的硬體資源虛擬化來允許獨立〇s影像2犯、加4 與208的同時執行。 服務處理器290可用來提供種種服務,譬如在分割部 分中之平台失誤的處理。這些服務亦可充當做一服務媒 介,以將失誤往回報給賣方,譬如萬國商業機器公司。不 同分割部分的操作可經由硬體管理控制台來控制,譬如硬 體官理控制台280。硬體管理控制台280係為一個別的資 料處理系統,從該系統,系統管理者可進行種種功能,包 括將資源重新分配到不同分割部分。 在LPAR環境中,不允許在一分割部分中的資源或程 17 200809514 ίΐ響在另—分割部分中的操作。再者,為了有用,資源 ==被微粒化。例t ’將附接到特定PCI主橋接器 的所有I/O配接’指派到相同分割部分常常令 2接受’其將限制系統的配置性,包括在分割部分^ 動悲移動資源的能力。The stream is 14 〇, which contains the switch and the bridge. In a similar manner, PCI 11 200809514 input/output adapter 122 is coupled to Ρα local bus 141 via I/O fabric 181, and PCI input/output adapters 123 and 124 are via 1/〇, and The 182 is connected to the PCI local bus 142, and the graphic adapter 148 is coupled to the hard disk H 149 via the i/Q fabric 183 to form the ρα local bus 143. The 1/〇 fabric 180-183 provides the interface to ρα. Bus bar 14〇-143. A typical PCI host bridge will support between four and eight I/O adapters (for example, expansion slots for interposing connectors). Each ρα input/output V 1:1:5 mating 11 12G_124 provides an interface between the data processing system 1GG and the input/output device, such as, for example, other network computers, which is a client to the data processing system 100. . The PCI host bridge 130 provides an interface to the pci bus bar 140 for connection to the I/O bus bar 112. PCI bus 140 also connects pci master bridge 130 to the service processor mailbox interface, and ISA bus access through logic 194 and I/O fabrics. The service processor mailbox interface and the ISA bus access are logically 194, and the forwarding destination is pci/isa, 'bridged 193 PCI access. A non-volatile random access memory (NVRAM) storage 192 is connected to the ISA bus 196. Service processor 135 is coupled to service processor mailbox interface and ISA bus access logic 194 via its local PCI bus 195. The service processor 135 is also connected to the processing cry 101-104 via a few JTAG/IC bus bars 134. The JTAG/I C bus 134 is a combination of a JTAG/Broom Bus (see IEEE 1149.1) and a Phillips I2C bus. Alternatively, however, the JTAG/I2C bus 134 can be replaced only by the Phillips I2C bus or the 12 200809514 JTAG/sweeper bus. All SP_ATTN signals of host processors 101, 102, 101 and (10) are connected together to service the crying/interrupt input signal. The service process H has its own local memory 191 and has access to the hardware OP_ panel 19〇. When the data processing system 100 is initially booted, the service processor 135 uses the JTAG/I2C bus 134 to interrogate the system (host) processor = 1-104, memory controller/cache 108, and 1/0 bridge. 11〇. Upon completion of this step, the service processor 135 has a catalog and topology understanding of the material processing system 100. The service processor 135 also performs a self-test (BISTs), basic guarantee test on the financial components discovered by querying the host processing benefits 101-HH, memory controller/cache 1〇8, and (10) bridge (10). (BATs) 卩和记纽峨. Any error information detected in msTs, baTs, and memory test_failed will be aggregated and reported by service processor 135. ^ If after taking out the wrong components of BISTs, BATs and Keelong, the intention of the source (4) is still effective. The still poor processing system will be allowed to start loading the executable password to the local (host). I have a memory of 160_163. The service processor will then release the host processing H for execution of the local password _3 (10). The host processor is executing the password of the individual operating system from the data processing system KKm, and the service processor m will enter. Monitoring and reporting error modes. Crying out through service processing 13 200809514 Viewing project types include, for example, cooling 阙 speed wire, thermal sensor, power supply regulator, and processor 101_104, local memory 160-163 and 1/ The recoverable and non-resolvable errors reported by the bridge 11. The service processor 135 is responsible for saving and reporting error information related to all of the monitored items in the data processing system 100. The service processor 135 is also defined by error and definition. To take action, for example, the service processor U5 will notice too many recoverable errors on the processor's cache memory, and decide that this can be predicted as a hard failure. The processor 135 can flag the source of the exit configuration during the current execution and future program initialization (IPLs). IpLs are sometimes referred to as, , boot, or boottrap <. ^Data processing system 10 can be implemented using a variety of commercially available computer systems. For example, data processing systems can be used from IWC commercial machines. The IBM8eServerTMiSeriesTM Model 840 system is known as the system. This system supports the logical division of the 〇s/4〇〇8 operating system, which is available from IWC. ^ It is understood that the hardware described in Figure 1 can be modified. For example, other peripheral devices, such as optical disk drives and the like, can also be used in addition to or in place of the hardware described. The examples described are not necessarily inevitable Contains architectural limitations with respect to the present invention. 14 200809514 I/O bridge 110 includes a table 11a and a table related control. Conversion and control items (TCEs) are stored in the TCE table 11a. Table 110a is I/O address translation and protection mechanism, based on 1/page, provides I/O operation capability for controlling I/O devices to physical pages. The TCE entry combines the real address of the physical page of the physical memory. And borrow The address of the I/O bus is presented by the I/O adapter. Each item combines a specific physical page with a specific I/O bus page. The TCE table 丨1〇a I/O operated by DMA The bus address is indexed. This table is used to ensure that the 1/〇 adapter only accesses the assigned storage location. Furthermore, the TCE mechanism provides an indirect address mechanism that allows embodiments of the present invention to be implemented. 2 depicts a block diagram of an exemplary logical partitioning platform including the illustrated embodiment of the present invention. The hard system in the logical partitioning platform 2 can be implemented, for example, as the data processing system 1 of FIG. The logical partitioning platform 2 includes a dividing hardware 230, operating systems (〇s) 202, 204, 206, 208 and a platform blade body 210. Operating systems 202, 204, 206, and 208 are multiple copies of a single operating system, or multiple heterogeneous operating systems that are concurrently executed on a logical splitting platform. These operating systems can be implemented using 〇s/4〇〇8, which is designed to interface with the split management firmware interface, such as the Hypervisor. OS/4008 is only used as a shell example in these illustrative embodiments. Other types of operating systems, such as Αΐχ8 and Linux8, can also be used depending on the particular implementation. 200809514 Operating systems 202, 204, 206, and 208 are located in split portions 2〇3, 205, 207, and 209. The Hypervisor software is an example of a software that can be used to implement the platform firmware 210' and is available from IWC. The blade system is stored in a memory chip, which can save its inner valley without power. For example, such as read-only memory (ROM), programmable rom (PROM), erasable programmable R 〇M (EPROM), electrically erasable programmable ROM (EEPROM), and non-volatile random access memory (non-volatile RAM). Further, these divided portions also include the divided portions of the blade bodies 211, 213, 215, and 217. Segmented firmware 21, 215, 215, and 217 can be implemented using the initial boot code, the ffiEE_1275 standard open firmware, and the execution time digest software (RTAS), which is available from the International Business Machines Corporation. When the divided portions 203, 205, 207, and 209 are exemplified, a copy of the startup code is loaded into the divided portions 203, 205, 207, and 209 by the platform firmware 210. Thereafter, control is transferred to the boot code. The boot code is then loaded with the open firmware and RTAS. A processor that is related or assigned to the segmentation portion is then dispatched to the memory of the segmentation portion to perform segmentation of the partial firmware. The segmentation hardware 230 includes a plurality of processors 232-238, a plurality of systems 16 200809514 system memory unit 240_246, a plurality of walls/fine connections n (I〇A) M8-262: a storage unit 270 and a TCE table 272. Processor 232_238, memory single το 24 〇 246, non-volatile random storage memory library 298, and I/O adapter 248-262 scales or portions thereof, can be assigned to the logical sub-partial platform One of the multiple divisions within 200, wherein the parent corresponds to one of the operating systems 202, 204, 206, and 208. The platform firmware 210 performs a number of functions and services of the partitioning sections 2〇3, 2〇5, 207, and 209 to generate and enforce the splitting of the logical partitioning platform 2〇〇. The platform firmware 210 is a dynamic body that implements the same virtual machine as the base hardware. Therefore, the platform dynamic body 21 allows the simultaneous execution of the independent image, the addition of 4 and 208, by virtualizing the hardware resources of the logical segmentation platform 200. Service processor 290 can be used to provide a variety of services, such as the handling of platform errors in the segmentation portion. These services can also serve as a service medium to reward mistakes to sellers, such as IWC. The operation of the different partitions can be controlled via a hardware management console, such as hardware administration console 280. The hardware management console 280 is a separate data processing system from which system administrators can perform various functions, including reallocating resources to different partitions. In an LPAR environment, resources or procedures in a split section are not allowed. Furthermore, in order to be useful, the resource == is micronized. Example t 'Assigning all I/O adapters attached to a particular PCI host bridge to the same split portion often accepts 2 that it will limit the system's configurability, including the ability to move resources in the split portion.

即於疋,在連接I/O配接器到1/0匯流排的PCI主橋接 為中’有些功能是必獅,以便能夠指派資源,譬如個別 I/O士配接H或部分的1/0配接朗個別分卿分;而且, 同時避免被指派的資源影響其他分割冑A,譬如藉由獲得 對其他分割部分之資源的存取。 圖3係為根據本發明說明性實施例所設計之頁遷移轉 換過程的方塊圖。頁遷移係將資料從一實體記憶體頁移動 到=同記憶體頁的過程。對資料的使用者而言,此動作應 该是透明的。例如,在本說明性實例中,頁遷移包含將在 實體記憶體304中的頁3〇2移動到在實體記憶體中的不同 位置’或者頁306。藉著經由轉換控制項(TCE)機制來 將I/O配接器的I/O匯流排頁位址重新改向,頁遷移之進 行對使用者而言是透明的。TCE機制使用TCe表308,以 確認資料的實體記憶體位址。TCE表3〇8係為位址轉換對 照表的範例,譬如在圖1中的TCE表11〇a。TCE表係被 使用來將I/O配接器的DMA存取導向到適當目標的記憶 體’並且可被改變,以在資料遷移以後,指向在新資料位 18 200809514 置上的新記憶體位址。 TCE表308會被實施用於每一 1/〇主橋接器,以支援 附接到主舰流排之任—I/c>橋接器之第二匯流排上的所 有輸入/輸出(1/0)配接器。TCE表308包括多重頁項, 譬如TCEs 310與312。這些頁項係依據在1/〇匯流排上的 頁位址,藉由位址轉換與控制機制來取得。一或更多TCEs 指向-特羽。如騎示,TCEs 與312兩者指向頁 302。、當頁302被遷移到頁3〇6時,TCEs的内容必須被改 變’以指向新頁’或者頁勝當頁遷料,這種每一 tce 内容的改變會被進行,而不用牽涉到1/〇配接器或ι/〇配 接盗的裝置驅動器。以此方式,最初指向頁3〇2之tcEs 31〇 與312的内容會被改變,以指向記憶體遷移的目標或頁 圖4為在已知1/0主橋接器(譬如ρα主橋接器(測》That is, in the PCI main bridge that connects the I/O adapter to the 1/0 bus. 'Some functions are mandatory, so that resources can be assigned, such as individual I/Os to match H or part of 1/ 0 is matched with the individual points; and, at the same time, the assigned resources are prevented from affecting other partitions A, for example by obtaining access to resources of other partitions. 3 is a block diagram of a page migration conversion process designed in accordance with an illustrative embodiment of the present invention. Page migration is the process of moving data from a physical memory page to a memory page. For users of the data, this action should be transparent. For example, in this illustrative example, page migration involves moving page 3〇2 in physical memory 304 to a different location 'or page 306 in physical memory. By redirecting the I/O adapter's I/O bus page address via the Transition Control (TCE) mechanism, the page migration is transparent to the user. The TCE mechanism uses the TCe table 308 to confirm the physical memory address of the data. The TCE table 3〇8 is an example of an address conversion comparison table, such as the TCE table 11〇a in Fig. 1. The TCE form is used to direct the DMA access of the I/O adapter to the appropriate target's memory' and can be changed to point to the new memory address placed on the new data bit 18 200809514 after the data migration. . The TCE table 308 will be implemented for each 1/〇 main bridge to support all inputs/outputs attached to the second bus of the I/c> bridge attached to the main ship flow (1/0 ) Adapter. The TCE table 308 includes multiple page items, such as TCEs 310 and 312. These page items are obtained by the address conversion and control mechanism based on the page address on the 1/〇 bus. One or more TCEs point to - special feathers. As shown in the ride, both TCEs and 312 point to page 302. When page 302 is migrated to page 3〇6, the content of TCEs must be changed 'to point to a new page' or the page wins when the page is relocated. This change of each tce content will be performed without involving 1 /〇 Adapter or ι/〇 with a stolen device driver. In this way, the contents of tcEs 31〇 and 312 that originally point to page 3〇2 are changed to point to the target or page of memory migration. Figure 4 shows the known 1/0 main bridge (such as ρα main bridge ( Measurement"

^3=方塊圖°在所示的實例中,PCI主橋接器4〇〇 包“己fe體映射I/O (MMI0)仔列與控制4〇2、MM 宁列與控制4〇4、舰仔列與控制406、以及位址 ^ 彻。當PQ主橋接器伽接收來自主匯流排 並対丨i IQ載人触存請树,PCI主橋接11 仔列, =制在麵0佇列與控制術上的聰〇載入盘儲存 操作包含單n ' 址ΜΜΙ0峨㈣^^ 19 200809514 列與控制4〇2,MMI0載入與儲存請求流出到第二匯流排 412 〇 > PCI主橋接器400亦接收來自第二匯流排412的DMA 讀取與寫入請求,其係流入於DMA佇列與控制4〇6内。 當從第二匯流排412收到DMA寫入與讀取請求時,ρα 主橋接器400佇列,並控制在DMA佇列與控制4〇6上的 DMA寫入與頃取請求。DMA仔列與控制傷可引導位址 轉換與控制408,以從在圖丨中的TCE表騰取得轉換 控制項。位址轉換與控制408在對應所供應DMA位址的 TCE表中取得一項,以便決定實體記憶體的位置。以此方 式,取得項會被使用來轉換與控制DMA讀取或寫入請求。 PCI主橋接器400亦接收來自第二匯流排412的 MMIO載入回覆’其係在MMI〇載入回覆佇列與控制 上被佇列與控制。在DMA佇列與控制406上,DMA寫 入與讀取請求流出到主匯流排41〇。MMIO載入回覆亦同 樣地從MMIO載入回覆佇列與控制4〇4流出到主匯流排。 圖5為包括根據本發明所示實施例所設計之ρα主橋 接為中轉換控制項遷移控制之元件的方塊圖。本發明態樣 可應用到全部形式的週邊元件互連(pCI),包括習知pci、 PCI-X與PCI Express,並且應用到其他1/〇匯流排。像在 圖4中的PCI主橋接器(phb) 4〇〇,PCI主橋接器5〇〇 20 200809514 包含記憶體映射I/O (MMI〇)佇列與控制观、mmi〇載 入回覆作列與控制5〇4、DMA件列與控制、以及位址 轉換與控制5G8 ’其係進行與那些在圖4知 器勸之元件類似的操作。不過, 含TCE遷移控制510,其包含用來延遲所選DMA操作的 邏輯’以允許實體頁的遷移,科會不魏影響其他處理 中DMA操作。當位址轉換與控制5〇8從tce綠得一項 時’譬如圖1中的TCE表隐,TCE遷移控制51〇則會 接收並且檢查_,喊定結合該項之記紐實體頁的遷 移是否在概巾。TCE遷移㈣進—步被詳細說明於 以下圖7中。 、 圖6為根據本發明所示實施例設計的示範性tce項。 TCE項_包括轉換資訊602、讀取與寫入控制資訊604、 以及遷移進展(MIP)位元6G6。轉換資訊可包括資 料的實體頁數目(亦即,在記憶體中該頁的起始位址)。、 讀取與寫人控制資訊6G4可包括指出該頁健藉由dma 讀取、僅僅藉由DMA寫入、藉由DMA讀取與DMA寫 入兩者、或者一點都不用來存取的控制。 遷移進展(MIP)位元606指出相關於TCE項之記情 體的特別實體頁目前是否正在遷移。假如位元^ 被設定的話(亦即,位元=丨),對特灯的任何dma 操作與來自_④配接器的任何進行巾的dma操作將 21 200809514 被延遲,直到頁遷移完成。假如Mlp位元6〇6為關閉(〇均 (亦即,MIP位元的),到該頁的DMA操作會被允許持續。 圖7係顯示根據本發明所示實施例設計之TCE遷移控 制邏輯的方塊圖。TCE遷移控制邏輯7〇〇係為如圖5以上 所述之TCE遷移控制邏輯51〇的詳細實例。當位址轉換與 控制,譬如圖5中的位址轉換與控制5〇8,從系統記憶體 702中的TCE表取得TCE時,可使用TCE遷移控制^輯 700。所取得的TCE 704被放置在TCE固持暫存器7〇6, 並且被遷移控制狀態機器708所檢查。特別是,遷移控制 狀態機器708檢查在TCE 704中的MIP位元71〇,以決定 在TCE巾师紐是否結合目前正被遷移到系統記憶體 中之不同位置的實體頁。例如,假如MIp位元71〇被設定 成1的話,遷移控制狀態機器7〇8則發送信號到在圖5中 的位址轉換與控制508,以延遲DMA712。直到TCE遷移 控制邏輯700移除延遲DMA為止,在圖5中的位址轉換 與控制508將不會允許來自進行DMA之1/〇配接器的任 何DMA寫入或DMA讀取請求,或者來自1/〇配接器的 任何MMIO載入回覆,以繼續進行。 應該要注意的是,雖然DMA寫入或讀取請求不被允 許,但是DMA讀取回覆會被允許繞過被延遲的MMI〇載 入或儲存請求,該些請求係沿著該路徑被佇列。允許DMA 讀取回覆繞過被延遲的MMIO載入或儲存請求,其係允許 22 200809514 藉由圖5的位址轉換與控制5〇8來重新讀取丁CE,既使 MMIO佇列以未完成的請求來支持。 當延遲DMA712發出信號時,藉由圖5位址轉換與控 制508所取得的TCE從TCE固持暫存器7〇6被丟棄,且 TCE重新取得計時器714會被啟動716。TCE重新取得計 時器714係實施㈣免使系統充滿請求m统沒有時 間完成頁遷移操作時重新組得TCE。因此,TCE重新取得 計時器714會延長當圖5中的位址轉換與控制5〇8重新取 得TCE時的軸,允許系統軟體或幢完成該頁遷移 操作,並且將在TCE中的MIP位元重新設定為〇。遷移 控制狀態機器708 P遺後則等待TCE f新取得計時器714 期滿。 當遷移控制狀態機器708決定TCE重新取得計時器 714已經終止718時’遷移控制狀態機器7〇8則將在圖$ 中之位址轉換與控制5〇8的⑽u發出72(),以重新取得 TCE。當TCE被重新取得時,重新取得項則會被放置在固 持暫存器中。該程序會在此迴路中持續,直到遷移控 制狀態機器708決定在遷移操作之TCE中的MIp位元7⑺ 是〇,在此時,藉由延遲DMA信號712的釋放,將在圖$ 中的位址轉換與控制508發出信號以持續’且藉由tce 所參考之到實體頁的DMA操作可重新開始。曰 23 200809514 在另一實施例中,TCE重新取得計時器714可被忽 視,從而當位址轉換與控制508決定在固持暫存器7〇6中 TCE 704的MIP位元710被設定為i時,造成TCE的拋 棄與重新取得被立即執行。 在大部分情形中,PCI主橋接器將不會區分來自/去到 =同I/O配接器之間的操作,從而造成所有DMA與MMI〇 操^暫時㈣。獨,Ρα·χ與Ρα邮咖配接器提供允 許貧料流之區別的數種機械來作為性能進展。例如,在 PCI-X與PCI Exp跳中,請求ID (匯流排號碼、裝置號 碼與功能號碼)可用以關聯DMAS。關聯MMIQS到請求 ID之MMIOs的解碼,可用以將MMI〇路徑綁到延遲dma 路徑。就PCI Express而言,這可藉由虛擬通道機制來完 成。作為-種額外的性能強化,同樣要注意的是,DMA 知作不需要觀遲,朗帛—DMA寫人操侧具有Mip 位兀710之頁被設定為止,因此,在該延遲被儘可能延遲 很長或者甚至被完全避免之處,實關皆是有可能的。亦 即’假如正遷移之頁不被修改的話,持續操作是有可能 =_ DMA寫入被檢測到正在遷移之頁的話,那麼 =自/去到那頁,以及來自/去到請求DMA寫入該頁之ι/〇 又置的進一步DMA操作則必須被延遲。 、圖8係為顯示根據本發明所示實施例設計之藉由硬體 勺位址轉換與控制以及遷移控制狀態機器,以暫時並選擇 24 200809514 性延遲蚊DMA齡輯狀㈣的絲圖。該流程起 始於位址職以及_接收來自g流排之直接記憶體存 取請求,而起始位址轉換的控制邏輯(步驟8〇2)。位址轉 換可藉由存取TCE絲進行,以得到包含實體頁位址的 TCE位址轉換則應該將位址應用到第二匯流排,以便能 夠存取結合來自第-匯流排之記憶體請求的正確實體 頁。硬體隨後決定所需要的TCE丨否已經被快取,譬如在 圖1中之I/O橋接H 110中的1/0資料緩衝器内,以及該 快取資料是否有效(步驟謝)。假如TCE被快取且有效^3=Block diagram ° In the example shown, the PCI main bridge 4 package "has the body mapping I / O (MMI0) and control 4 〇 2, MM Ning column and control 4 〇 4, ship The column and control 406, and the address ^ is clear. When the PQ main bridge gamma receives from the main bus and 対丨i IQ manned to the tree, the PCI main bridge is 11 columns, = system in the 0 column and Controlling the Supreme load disk storage operation includes a single n ' address ΜΜΙ 0 峨 (four) ^ ^ 19 200809514 column and control 4 〇 2, MMI0 load and store request flow out to the second bus 412 〇 > PCI main bridge 400 also receives DMA read and write requests from the second bus 412, which flow into the DMA queue and control 4〇 6. When a DMA write and read request is received from the second bus 412, The ρα main bridge 400 is queued and controls the DMA write and access requests on the DMA queue and control 4〇6. The DMA queue and control damage can be directed to address translation and control 408 to be in the map. The TCE totem obtains the conversion control. The address translation and control 408 takes an entry in the TCE table corresponding to the supplied DMA address to determine the location of the physical memory. In this way, the get item will be used to convert and control the DMA read or write request. The PCI host bridge 400 also receives the MMIO load reply from the second bus 412 'it is loaded in the MMI 回 reply queue and control On the DMA queue and control 406, the DMA write and read requests flow out to the main bus 41. The MMIO load reply is also loaded from the MMIO to the reply queue and control 4〇4 Flowing out to the main busbar. Figure 5 is a block diagram of an element including a ρα master bridge designed for medium transition control transition control in accordance with an embodiment of the present invention. The present invention is applicable to all forms of peripheral component interconnection. (pCI), including conventional pci, PCI-X and PCI Express, and applied to other 1/〇 bus bars. Like the PCI main bridge (phb) in Figure 4, PCI main bridge 5〇〇 20 200809514 Includes memory mapping I/O (MMI〇) queue and control view, mmi〇 load replies and control 5〇4, DMA block and control, and address translation and control 5G8 Those operations that are similar to those suggested in Figure 4. However, with TCE migration System 510, which includes logic to delay the selected DMA operation to allow migration of physical pages, the branch will not affect other processing DMA operations. When the address conversion and control 5 〇 8 from tce green one' T As shown in the TCE table in Figure 1, the TCE migration control 51〇 will receive and check _, and determine whether the migration of the key entity page combined with the item is in the towel. The TCE migration (four) step is described in detail below. 7 in. Figure 6 is an exemplary tce term designed in accordance with an embodiment of the present invention. The TCE entry_ includes conversion information 602, read and write control information 604, and migration progress (MIP) bit 6G6. The conversion information may include the number of physical pages of the data (i.e., the starting address of the page in the memory). The read and write control information 6G4 may include a control indicating that the page is read by dma, only by DMA write, by DMA read and DMA write, or not at all. Migration Progress (MIP) bit 606 indicates whether the particular entity page associated with the TCE item's case is currently being migrated. If the bit ^ is set (ie, bit = 丨), any dma operation on the special light and any dma operation from the _4 adapter will be delayed until 2008 page migration is completed. If the Mlp bit 6〇6 is off (〇, ie, MIP bit), the DMA operation to the page will be allowed to continue. Figure 7 shows the TCE migration control logic designed according to the illustrated embodiment of the present invention. The block diagram of the TCE migration control logic 7 is a detailed example of the TCE migration control logic 51〇 as described above in Figure 5. When the address is translated and controlled, as shown in Figure 5, the address translation and control 5〇8 When the TCE is obtained from the TCE table in the system memory 702, the TCE migration control 700 can be used. The obtained TCE 704 is placed in the TCE holding register 7〇6 and checked by the migration control state machine 708. In particular, the migration control state machine 708 checks the MIP bit 71 in the TCE 704 to determine if the TCE is incorporating a physical page that is currently being migrated to a different location in the system memory. For example, if the MIp bit If element 71 is set to 1, the migration control state machine 7〇8 sends a signal to address translation and control 508 in FIG. 5 to delay DMA 712 until TCE migration control logic 700 removes the delayed DMA. Address translation and control 508 in 5 Any DMA write or DMA read request from the 1/〇 adapter of the DMA will be allowed, or any MMIO load reply from the 1/〇 adapter will continue to proceed. It should be noted that although DMA Write or read requests are not allowed, but DMA read replies are allowed to bypass the delayed MMI 〇 load or store requests, which are queued along the path. Allow DMA read replies to bypass The delayed MMIO load or store request allows 22 200809514 to reread the D CE by the address translation and control 5〇8 of Figure 5, even if the MMIO queue is supported by an outstanding request. When the DMA 712 signals, the TCE obtained by the address translation and control 508 of FIG. 5 is discarded from the TCE holding register 7〇6, and the TCE reacquisition timer 714 is started 716. The TCE reacquires the timer 714 Implementation (4) Re-establishing the TCE when the system is full of requests and does not have time to complete the page migration operation. Therefore, the TCE re-acquisition timer 714 will extend the axis when the address conversion and control 5〇8 in Figure 5 re-acquire the TCE. , allowing the system software or building to complete the page The migration operation, and the MIP bit in the TCE is reset to 〇. The migration control state machine 708P waits for the TCE f new fetch timer 714 to expire. When the migration control state machine 708 decides the TCE reacquisition timer 714 When the 718 has been terminated, the migration control state machine 7〇8 will transfer 72% of the address conversion and control 5〇8 in the figure $ to regain the TCE. When the TCE is reacquired, the reacquired item Will be placed in the holding register. The program will continue in this loop until the migration control state machine 708 determines that MIp bit 7(7) in the TCE of the migration operation is 〇, at which point the bit in the graph $ will be delayed by the release of the delayed DMA signal 712. The address translation and control 508 signals to continue 'and can be restarted by the DMA operation of the physical page referenced by tce.曰23 200809514 In another embodiment, the TCE reacquisition timer 714 can be ignored so that when the address translation and control 508 determines that the MIP bit 710 of the TCE 704 is set to i in the hold register 7〇6 The abandonment and re-acquisition of the TCE was immediately implemented. In most cases, the PCI host bridge will not distinguish between the operations from/to the I/O adapter, causing all DMA and MMI operations to be temporary (4). Separately, the Ρα·χ and Ρα 邮可配器s provide several machines that allow for the difference between lean streams as a performance advancement. For example, in PCI-X and PCI Exp hops, the request ID (bus number, device number, and feature number) can be used to associate the DMAS. Decoding the MMIQS to the MMIOs of the request ID can be used to tie the MMI path to the delayed dma path. In the case of PCI Express, this can be done by a virtual channel mechanism. As an additional performance enhancement, it is also important to note that the DMA does not need to be late, and the DMA writer has the Mip bit 710 page set, so the delay is delayed as much as possible. It is possible to have a long or even complete avoidance. That is, if the page being migrated is not modified, it is possible that the persistent operation =_ DMA write is detected on the page being migrated, then = from/to the page, and from/to the request DMA write Further DMA operations on this page must be delayed. FIG. 8 is a view showing a machine for temporarily and selecting 24 200809514 sexually delayed mosquito DMA age series (4) by a hardware spoon address conversion and control and migration control state machine designed according to the embodiment of the present invention. The flow begins with the address and _ receives the direct memory access request from the g stream, and the control logic for the start address translation (step 8〇2). Address translation can be performed by accessing the TCE wire to obtain a TCE address translation containing the physical page address. The address should be applied to the second bus to enable access to the memory request from the first bus. The correct physical page. The hardware then determines if the required TCE has been cached, such as in the 1/0 data buffer in the I/O bridge H 110 in Figure 1, and whether the cache data is valid (steps). If the TCE is cached and valid

的話1體貞丨會允許直接記憶體存取繼續使用快取TcE (步驟806)。假如TCE沒被快取的話,硬體則會延遲特 定請求的直接記龍存取(步驟_),懸置自職 得 TCE。 當TCE取得時(步驟81〇),決定是否在該項中的娜 被設定成1(步驟812)。假如MIP位元沒被設定的話(MIp 位疋=0) ’那麼硬體則會移除直接記憶體存取延遲(步驟 8H)。流程隨後回到步驟8〇6 ’且直接記憶體 許繼=用被快取的TCE。往回到步驟812,假如Mip位 =皮奴為1的話’那麼被快取的TCE則會被抛棄(步驟 6),且起始TCE重新取得計時器(步驟8 制狀態機器隨後耸往TrT7+ 咖。CE重新取得計時11期滿(步驟 田。寸态期滿時,流程則會回到步驟81〇, 會從TCE表被再度取得,流财自_後繼續。 25 200809514 圖9係為顯示根據本發明所示實施例之藉由軟體/韋刃 體所進行流程來控制頁遷移的流程圖。該流程開始於起始 頁遷移的軟體/韌體(步驟902)。軟體/韌體設定MIp位元 (MIP位元二1)於指向記憶體頁被遷移的每一 TCE項中 (步驟904)。設定MIP位元為1會發出頁遷移正在進展 的t ϊ虎。因應MIP位元的改變’ TCEs的每一快取副本合 失效(步驟906)。TCE項的失效係在該技藝中已知,並^ 可依據所使用的平台以種種方式來進行。 軟體/韌體隨後發出MMIO載入到每一 PCI主橋接 器,橋接器使用TCEs,而它們的MIP位元則設定為丨(步 驟908)。這些MMIO載入造成TCEs的任何硬體失效,以 在MMIO載入回覆回到處理器以前,抵達ρα主橋接器, 並且將確保在其失效赠所有到TCE的寫人已經注滿記 憶體(此流程係藉由正常PCI指揮規則來確保)。 軟體/韋刃體等待所有同步化的MMI〇載入完成(步驟 910)。當載人完成時,藉由進行軟體載人到記憶體來源頁 與儲存到目標頁(步驟犯),軟齡體隨制會將舊實 體=的内容複製到新頁。—旦完成了此複製,軟體/韋刃體則 设疋TCEs以指向新頁,並將在那些tcEs中力娜位元 設定為0 (步驟914)。軟體%體等待夠長,足以使所有直 接記憶體存取讀取請錢喊,其㈣在處对使用舊 26 200809514 記憶體 遷移完成(步驟 (步驟916)。-旦決定完成所有直接 嗔取知作的話,軟體/韌體則可宣告頁 918)〇 、 包硬體實施例的形式,或者—種 式碼的資料,=:實_。適用於儲存與/或執行程 耗合由系統匯流排直接或間接 實執行猶包括在真 存,而且Λ崎躺的本地記紐、大容量错 數目的料:卜〉某程式碼的暫時儲存’以便減少時間碼 新取回1、體則必須在執行期間内從大量儲存被重 系統或遠端列印器输;=而=輕合到其他資料處理 R w θ储存衣置。數據機、規線數據機、以 及乙太卡僅僅是, 些目前可得到的網路配接器種類 或者 27 200809514 藝者而言,許多修改與改變將顯而易見。本實施例可被選 擇與說明’讀最佳解釋本發_原理、實随用,並且 致使-般熟諳此技藝的其他人理解本發明,*具有種種變 更的各種實施例則適用於考量的特別用途。 【圖式簡單說明】 本發明特色之新穎特徵陳述在所附申請專利範圍。不The 1 贞丨 will allow direct memory access to continue using the cache TcE (step 806). If the TCE is not cached, the hardware will delay the direct access to the specific request (step _), suspending the self-employed TCE. When the TCE is acquired (step 81), it is determined whether or not the nar in the item is set to 1 (step 812). If the MIP bit is not set (MIp bit 疋 = 0) then the hardware will remove the direct memory access delay (step 8H). The flow then returns to step 8〇6' and the direct memory is followed by the cached TCE. Going back to step 812, if the Mip bit = Pinu is 1, then the cached TCE will be discarded (step 6), and the starting TCE will reacquire the timer (step 8 state machine then towers to TrT7+ coffee) CE re-acquisition time 11 expires (step field. When the inch period expires, the process will return to step 81〇, will be re-acquired from the TCE table, and the flow will continue from _. 25 200809514 Figure 9 shows the basis A flow chart for controlling page migration by a software/wei blade body according to the embodiment of the present invention. The flow begins with the software/firmware of the start page migration (step 902). The software/firmware sets the MIp bit. The element (MIP bit 2) is in each TCE entry that is directed to the memory page to be migrated (step 904). Setting the MIP bit to 1 will issue a page migration that is progressing. In response to the change of the MIP bit' Each cached copy of the TCEs fails (step 906). The failure of the TCE entry is known in the art and can be performed in various ways depending on the platform used. The software/firmware then issues MMIO loading to For each PCI host bridge, the bridge uses TCEs, and their MIP bits are set. Set to 丨 (step 908). These MMIO loads cause any hardware failure of the TCEs to arrive at the ρα master bridge before the MMIO load returns to the processor, and will ensure that all writes to the TCE are given at its expiration. The person has been filled with memory (this process is ensured by normal PCI command rules). The software/web is waiting for all synchronized MMIs to be loaded (step 910). When the man is done, by software Manned to the memory source page and stored to the target page (steps), the soft body will copy the contents of the old entity = to the new page. Once the copy is completed, the software/wei blade is set to TCEs To point to the new page, and set the Lina bit to 0 in those tcEs (step 914). The software % body waits long enough to make all direct memory accesses read and ask for money, and (4) use it everywhere. Old 26 200809514 Memory migration is complete (step (step 916). Once the decision is made to complete all direct knowledge, the software/firmware may declare page 918), the form of the package hardware embodiment, or the Code data, =: real _. Suitable for storage and / Execution process consumption is directly or indirectly implemented by the system bus, which is still included in the real memory, and the local memory of the squatting, the large number of wrong materials: the temporary storage of a certain code 'to reduce the time code new Back to 1, the body must be stored from the mass storage system or the remote printer during the execution period; = and = light to other data processing R w θ storage clothing. Data machine, line data machine, and B Taika is just a few of the currently available types of network adapters or 27 200809514 artists, many modifications and changes will be obvious. This embodiment can be selected and described as 'reading the best explanation of the present invention', and the other persons who are familiar with the art understand the present invention. * Various embodiments with various modifications are applicable to the particular considerations. use. BRIEF DESCRIPTION OF THE DRAWINGS The novel features of the invention are set forth in the appended claims. Do not

過’本發明本身’以及較佳使賴式,其進—步目的與優 點,將在結合關研讀時’參考所顯示實關詳細說明來 理解,其中: 圖1係為實施本發明態樣m统的細方塊圖; 圖2係為實施本發明之示範性邏輯分割平台塊 圖; 圖3係為根據本發明說明性實施例所設計之頁遷移轉 換流程的方塊圖; 圖4係為在已知Ρα主橋接器(pHB)中元件的方塊 圖5係為包括根據本發明所示實施例設計之pci主橋 接器中轉換控制項(财)遷移控制元件的方塊圖; 圖6係為根據本發明所示實施例所設計的示範性轉換 與控制項; ' 圖7係為顯不根據本發明所示實施例所設計之圖5 TCE遷移控__方塊圖; 圖8係為顯示根據本發明所示實施例所設計之藉由系 28 200809514 、==的㈣繼與控制狀顏⑼及遷健制狀態機 &quot;&quot;所進行程序的流程圖;以及 圖9係為顯示根據本發明所示實施例所設計 之軟體/韌體控制的流程圖。 【主要元件符號說明】 100資料處理系統 102處理器 104處理器 108記憶體控制器/快取 110a轉換控制項表 120 PCI輸入/輸出配接器 122 PCI輸入/輸出配接器 124 PCI輸入/輸出配接器 131 PCI主橋接器 133 PCI主橋接器 134JTAG/I2C 匯流排 140 PCI匯流排 142 PCI匯流排 148圖形配接器 150硬碟 161本地記憶體 163本地記憶體 181輸入/輸出組構 101處理器 103處理器 106系統匯流排 110輸入/輸出橋接器 112輸入/輸出匯流排 121PCI輪入/輪出配接器 123 PCI輪入/輸出配接器 130PCI主橋接器 132 PCI主橋接器 135服務處理器 141 PCI匯流排 143 PCI匯流排 149硬碟配接器 160本地記憶體 162本地記憶體 180輸入/輪出組構 182輸入/輪出組構 29 200809514</ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; 2 is a block diagram of an exemplary logical splitting platform for implementing the present invention; FIG. 3 is a block diagram of a page migration conversion flow designed according to an illustrative embodiment of the present invention; FIG. 5 is a block diagram of a conversion control item migration control element included in a pci main bridge designed according to the illustrated embodiment of the present invention; FIG. 6 is based on the present invention. FIG. Exemplary conversion and control items designed by the illustrated embodiments of the invention; 'FIG. 7 is a block diagram of FIG. 5 that is not designed according to the illustrated embodiment of the present invention; FIG. 8 is a diagram showing A flow chart of a program performed by the system 28 200809514, == (four) followed by the control shape (9) and the state machine &quot;&quot;; and Figure 9 is a diagram showing the procedure according to the present invention. Software/firm body designed as shown in the embodiment Control flow chart. [Main component symbol description] 100 data processing system 102 processor 104 processor 108 memory controller / cache 110a conversion control item table 120 PCI input / output adapter 122 PCI input / output adapter 124 PCI input / output Adapter 131 PCI main bridge 133 PCI main bridge 134JTAG/I2C bus 140 PCI bus 142 PCI bus 148 graphics adapter 150 hard disk 161 local memory 163 local memory 181 input / output fabric 101 processing 103 Processor 106 System Bus 110 Input/Output Bridge 112 Input/Output Busbar 121 PCI Wheel In/Out Adapter 123 PCI Wheel In/Out Adapter 130 PCI Host Bridge 132 PCI Host Bridge 135 Service Processing 141 PCI bus 143 PCI bus 149 hard disk adapter 160 local memory 162 local memory 180 input / wheel out fabric 182 input / wheel out fabric 29 200809514

183輸入/輪出組構 190OP面板 192非易失性隨機儲存記憶體儲存橋接器 服務處理⑤、信箱介面與ISA匯流排存取通道 258輪入/輪出配接器 262輪入/輪出配接器 270儲存 298非易失性隨機儲存記憶體儲存 195PCI匯流排 200邏輯分割平台 202作業*** 205分割部分 213分割部分韌體 206作業系統 209分割部分 217分割部分韌體 21〇平台韌體 232處理器 236處理器 248輪入/輪出配接器 252輸入/輪出配接器 256輪入/輪出配接器 260輸入/輪出配接器 272轉換控制項表 240記憶體 244記憶體 280硬體管理控制台 6〇2轉換資訊 196 ISA匯流排 2〇3分割部分 211分割部分韌體 204作業系統 207分割部分 215分割部分韌體 208作業系統 290服務處理器 230分割硬體 234處理器 238處理器 250輪入/輪出配接器 254輪入/輪出配接器 242記憶體 246記憶體 _TCE轉換控制項 606遷移進展(MIP)位元 30 200809514 604控制資訊 310轉換控制項 304實體記憶體 308轉換控制項表 312轉換控制項 302從頁 306至頁 400 pci主橋接器 402記憶體映射I/O (MMIO)佇列與控制 404 MMIO載入回覆佇列與控制4〇6DMA仵列與 408位址轉換與控制 410主要匯流排 , 】 412次要匯流排 500 PCI主橋接器 502記憶體映射I/O (MMIO)佇列與控制 504 MMIO載入回覆佇列與控制506 DMA佇列與控制183 input / wheel out fabric 190OP panel 192 non-volatile random storage memory storage bridge service processing 5, mailbox interface and ISA bus access channel 258 wheel / wheel adapter 262 wheel / wheel out 270 storage 298 non-volatile random storage memory storage 195 PCI bus 200 logical segmentation platform 202 operating system 205 segmentation portion 213 segmentation part firmware 206 operating system 209 segmentation section 217 segmentation part firmware 21 platform firmware 232 processing 236 processor 248 wheeled/rounded adapter 252 input/wheeling adapter 256 wheeled/rounded adapter 260 input/rounded adapter 272 conversion control item table 240 memory 244 memory 280 Hardware Management Console 〇2 Conversion Information 196 ISA Bus 2 〇 3 Division 211 Segmentation Part Firmware 204 Operating System 207 Segmentation 215 Segmentation Part Firmware 208 Operating System 290 Service Processor 230 Split Hardware 234 Processor 238 Processor 250 Round/Turn Out Adapter 254 Round/Turn Out Adapter 242 Memory 246 Memory _TCE Conversion Control 606 Migration Progress (MIP) Bit 30 200809514 604 Control Information 310 Conversion Control 30 4 physical memory 308 conversion control item table 312 conversion control item 302 from page 306 to page 400 pci main bridge 402 memory mapping I / O (MMIO) 与 column and control 404 MMIO load response array and control 4 〇 6 DMA仵 与 and 408 address conversion and control 410 main bus, 412 secondary bus 500 PCI main bridge 502 memory mapping I / O (MMIO) 与 column and control 504 MMIO load response queue and control 506 DMA Queue and control

508位址轉換與控制 510轉換控制項遷移控制I 702來自系統記憶體的轉換控制項 706轉換控制項固持暫存器7〇8遷移控制狀態機器 710遷移進展(MIP)位元712延遲DMA 720重新取得轉換控制項 714轉換控制項重新取得計時器 31508 address translation and control 510 conversion control item migration control I 702 conversion control item 706 from system memory conversion control item holding register 7 8 migration control state machine 710 migration progress (MIP) bit 712 delay DMA 720 re Acquiring conversion control item 714 conversion control item reacquisition timer 31

Claims (1)

200809514 十、申請專利範園: 種k擇1±延遲直接記憶體存取操作的電腦實施方法,該 電腦實施方法包含: 一每因應接收來自-輸人/輸$配接器之—直接記憶體存取 貝體頁的4求,檢查指向該實體頁之—轉換控制項中的 遷移進展位TL,其巾該遷移進展位元指出在該轉換控制 項中所指之該實體頁到系統記憶體中另-位置的遷移是否 正在進展;以及 假如該遷移進展位元指出該實體頁的遷移正在進展, 則延遲來自輸入/輸出配接器的該直接記憶體存取,同時繼 續從其他輸入/輸出配接器到系統記憶體中其他實體頁的其 他直接記憶體存取操作。 2·如申請專利範圍第1項的電腦實施方法,進一步包含·· 假如該遷移進展位元指出該實體頁沒有遷移正在進 展’則允許繼續對該實體頁的該直接記憶體存取。 3·如申請專利範圍第1項的電腦實施方法,其中該延遲步驟 包括延遲來自該輸入/輸出配接器的該直接記憶體存取,直 到完成遷移為止。 32 200809514 4.如申請專利範圍第】項 進-步包含: 法’射該檢查步驟 自系統記憶體中之一轉換控 項;以及 j唄表取仔邊轉換控制 放置該轉換控制項於該輸入/輪出配接器之—固持暫存 為中’以檢查在該轉換控制項中該遷移進展 5.如申請翻細f 1 _電腦實施方法,其巾該延遲步驟 包括使來自該輸入/輸歧接器的直接記憶體存取寫入請求 與直接記憶體存取讀取請求,以及來自該輸入/輪出配接器 之冗憶體映射輸入/輸出載入回覆失能。 6·如申請專利範圍帛1項的電腦實施方法,其中該延遲步驟 會被延遲,直到接收到正在遷移之該實體頁的—直接記憶 體存取寫入請求。 7·如申請專利範圍第1項的電腦實施方法,進一步包含: 因應該直接記憶體存取的延遲,拋棄來自該固持暫存 器的該轉換控制項; 33 200809514 從該轉換控制項表重新取得該轉換控制項,直到決定 該直接記憶體存取中該實體頁沒有遷移在進展。 8.如申5胃專利賴第丨項的電腦實施方法,進—步包含: 口應k該固持暫存II拋棄該轉換控制項,激發一重新 取得計時器;以及 因應該重新取得計時器的期滿,從該轉換控制項表重 新取得該轉換控制項,直到決定出該直接記憶體存取中該 實體頁’又有遷移在進展,其中每當該轉換控制項被拋棄且 該重新取得計時器期滿時,該轉換控制項會被重新取得。 9·如申請專利範圍第1項的電腦實施方法,進 一步包含: 因應接收該請求’決定需要進行該直接記憶體存取之 該轉換控制項是否被快取;以及 假如該轉換控制項沒被快取,則延遲該直接記憶體存 取’暫停自該轉換控制項表取得該轉換控制項。 10·如申請專利範圍第9項的電腦實施方法,進一步包含: 假如該轉換與控制項被快取,決定該快取轉換與控制 項是否有效;以及 34 200809514 政,允許該直接記憶體 假如該快取轉換與控制項為有 存取用該快取轉換與控制項繼續。 U.如申請專利範圍第i項的電腦實施方法,其中該實體頁之 該遷移包含: =該遷移進展位元設定在指向該實體頁的所有轉換控 制項中,以指出該實體頁的遷移在進展; 因應設定該遷移進展位元,使該轉換控制項的 本失效; 週邊元件互連 發出一記憶體映射輸入/輸出載入到每一 主橋接器; 因應完成所有記髓映職人/輸域人的蚊,將該 實體頁的内容複製到一新實體頁; 修改該轉換控制項,以指向該新頁; 設定每一遷移進展位元於該轉換控制項中,以指出該 實體頁沒有遷移在進展;以及 / 因應決定所有處理中直接記憶體存取讀取操作被完 成,宣告該遷移完成。 如申請專利範圍第η·電腦實施方法,其中記憶體映射 35 200809514 輪出狀造銳觀㈣項之失效,邮—記憶體映 认/輪it{狀__贿驾赠,抵賴週邊元件 互連主橋接器’並且雜該轉換控制項失效以前之到記憶 體的所有直接記⑽存取寫人會歡綱記憶體。 13. 如申請專侧第η項的電腦實施方法,其中該複製步驟 匕括進仃軟_人_實體頁,以及軟體儲存到該新頁。 14. -種延啦接記麵存轉作的裝置,魏置包含: 一輸入/輸祕接H,用來接收來自一輸入/輪出配接器 對系統記憶體中-實體頁之一直接記憶體存取的請求;以 及 -轉換控制項遷移控制,連接到該輪人/輸出橋接器, 其中該轉換控制項遷移控制包含—遷移控制狀態機哭. 其中該遷移控制狀態機器因應該輸入/輸出橋接°°器接收 該請求,來檢查在指向該實體頁之_轉換控制項中的一遷 移進展位元,其中該遷移進展位元指出該轉換控制項所指 之該實體頁到系統記憶體中另-位置的遷移是否正在進 展;以及 料該遷移控制狀態機器延遲來自該輸入/輸出配接器 36 200809514 的該直接記憶體存取,同時假如該遷移進展位元指出該實 體頁之遷移正在進展’則繼續從其他輸入/輸出配接器到系 統記憶體中其他實體頁的其他直接記憶體存取操作。 種k擇队遲直接讀體存取操作的資料處理系統,該 資料處理系統包含·· 檢-裝置,因應接收來自一輸入/輸出配接器之一直接 。己體存取對-實體頁的請求,來檢查在指向該實體頁之 T轉換控制項中的―遷移進展位元,其中該遷移進展位元 心出因應魏該請求,在該魏蝴項所狀該實體頁 之遷移到系統記憶體中另一位置是否正在進展;以及 延遲震置,用來延遲來自該輸入/輸$配接器的該直接 此體存取,同時假如該遷移進展位元指出該實體頁之遷 移正在進展,則繼續從其他輸入/輸出配接器到系統記憶體 中一他只體頁的其他直接記憶體存取操作。 16·如申請專利範圍第15項之資料處理系統,進一步包含: 允許裝置’假如該遷移進展位元指出該實體頁沒有遷 矛夕在進展,則允許繼續該直接記憶體存取到該實體頁。 37 200809514 17. 如申請專利範圍第15項之資料處理系統,其中該延遲裝置 延遲來自該輸入/輪出配接器的該直接記憶體存取 成該遷移。 18. 如申請專利範圍第15項之資料處理系統,其中該檢查裝置 進一步包含: 取得裝置,用來取得來自系統記憶體中之—轉換控制 項表的該轉換控制項;以及 放置裝置,用來放置該轉換控制項於該輸入/輪出配接 器中的-固持暫存器中,以檢查在該轉換控制項中的該遷 移進展位元。 19·如申請專利範圍第15項之資料處理系統,其中該延遲裝置 包括失能裝置,用來使來自該輪入/輸出配接器的直接記憶 體存取寫入請求與直接記憶體存取讀取請求,以及來自該 輪入/輸出配接器的記憶體映射輪入/輸出載入回覆失能。 〇·如申睛專利範圍第15項之資料處理系統,其中該實體頁的 遷移包含: 第一設定裝置,用來設定該遷移進展位元於指向該實 38 200809514 體頁的所有轉換控制項,以指出該實體頁的遷移在進展; 械知、、、效衣置’用來因應該遷移進展位元之設定而使該轉 換控制項的快取副本無效; 出衣置用來發出記憶體映射輸入/輸出載入到每一 週邊元件互連主橋接器; 〜2裝置’用來因應完成所有映射輸人/輸出載入的決 疋將該實體頁的内容複製到-新實體頁; :改=,用來修改該轉換控制項,以指向該新頁; 換㈣::疋,,用來設定每—該遷移進展位元於該轉 、’止叫岭實體歧有遷移麵展;以及 宣告裝置,用來闵處仏各上 ^ 操作完成之決定,岐直接記憶齡取讀取 39200809514 X. Applying for a patent garden: A computer implementation method for k-selecting 1±delayed direct memory access operation, the computer implementation method includes: a direct memory for each input from the input/output/connector Accessing the request of the body page, checking the migration progress bit TL in the conversion control item, the migration progress bit indicating the physical page to the system memory indicated in the conversion control item Whether the migration of the other location is progressing; and if the migration progress bit indicates that the migration of the physical page is progressing, delaying the direct memory access from the input/output adapter while continuing from other inputs/outputs Other direct memory access operations from the adapter to other physical pages in the system memory. 2. The computer implementation method of claim 1 of the patent scope further includes · allowing the direct memory access to the physical page to continue if the migration progress bit indicates that the physical page is not being migrated. 3. The computer-implemented method of claim 1, wherein the delaying step comprises delaying the direct memory access from the input/output adapter until the migration is completed. 32 200809514 4. If the scope of the patent application is in the following steps: the method of shooting the inspection step from one of the system memory; and the parameter of the conversion control to place the conversion control item on the input /rounding the adapter - holding the temporary storage as "to check the progress of the migration in the conversion control item. 5. If the application is to refine the f 1 _ computer implementation method, the delay step of the towel includes the input/transmission from the input/transmission The direct memory access write request and the direct memory access read request of the interface, and the redundant body mapped input/output load reply failure from the input/round out adapter. 6. A computer implementation method as claimed in claim 1, wherein the delay step is delayed until a direct memory access write request for the physical page being migrated is received. 7. The computer implementation method of claim 1, further comprising: discarding the conversion control item from the holding register due to a delay of direct memory access; 33 200809514 retrieving from the conversion control item table The conversion control item until the physical memory page in the direct memory access is not migrated in progress. 8. For the computer implementation method of Shen 5 stomach patent Lai Di, the further steps include: the mouth should be the holding temporary storage II to abandon the conversion control item, trigger a re-acquisition timer; and because the timer should be re-acquired After expiration, the conversion control item is re-acquired from the conversion control item table until it is determined that the physical page in the direct memory access has migration progressing, wherein whenever the conversion control item is discarded and the re-acquisition is timed When the device expires, the conversion control will be retrieved. 9. The computer implementation method of claim 1, further comprising: determining whether the conversion control item requiring the direct memory access is cached in response to receiving the request; and if the conversion control item is not fasted If the access is delayed, the direct memory access is paused to obtain the conversion control item from the conversion control item table. 10. The computer implementation method of claim 9, further comprising: if the conversion and control items are cached, determining whether the cache conversion and control items are valid; and 34 200809514, allowing the direct memory to be The cache conversion and control items continue with the cache conversion and control items for access. U. The computer-implemented method of claim i, wherein the migration of the entity page comprises: = the migration progress bit is set in all translation control items pointing to the entity page to indicate that the migration of the entity page is Progress; in response to setting the migration progress bit, invalidating the conversion control item; peripheral component interconnections issuing a memory mapping input/output to each host bridge; in response to completion of all recordings/transmission domains a human mosquito, copying the contents of the physical page to a new physical page; modifying the conversion control to point to the new page; setting each migration progress bit in the conversion control to indicate that the physical page is not migrated In progress; and / in response to the decision that all processing direct memory access read operations are completed, the migration is declared complete. For example, the application scope of the patent η· computer implementation method, in which memory mapping 35 200809514 turn out the sharp view (4) of the invalidation, post-memory reflection / round it {__ bribes, given the interconnection of peripheral components The main bridge 'and all the direct memory (10) access to the memory before the conversion control is disabled will be written to the memory. 13. For the computer implementation method of applying for the specific item n, the copying step includes the soft_person_physical page, and the software is stored to the new page. 14. - A device for delaying the transfer of the face, the Wei set contains: an input/transfer interface H, used to receive one from the input/round out adapter to the system memory - one of the physical pages directly a memory access request; and - a conversion control item migration control, connected to the round of the person/output bridge, wherein the conversion control item migration control includes - a migration control state machine crying. wherein the migration control state machine is supposed to input / The output bridge receives the request to check a migration progress bit in the _ conversion control item pointing to the physical page, wherein the migration progress bit indicates the physical page to the system memory indicated by the conversion control item Whether the migration of the other location is progressing; and the migration control state machine delays the direct memory access from the input/output adapter 36 200809514, and if the migration progress bit indicates that the migration of the physical page is Progress' continues from other input/output adapters to other direct memory access operations on other physical pages in system memory. A data processing system for a late-reading body access operation, the data processing system includes a detection device, which is received directly from one of the input/output adapters. The client accesses the request to the entity page to check the "migration progress bit" in the T conversion control item pointing to the entity page, wherein the migration progress bit heart responds to the request, in the Wei butterfly item Whether the migration of the physical page to another location in the system memory is progressing; and delaying the shaking to delay the direct access of the body from the input/output adapter, and if the migration progresses the bit Pointing out that the migration of the physical page is progressing, it continues from other input/output adapters to other direct memory access operations of a body page in the system memory. 16. The data processing system of claim 15 further comprising: allowing the device to allow the direct memory access to the physical page if the migration progress bit indicates that the physical page is not moving. . 37. The data processing system of claim 15, wherein the delay device delays accessing the direct memory from the input/rounding adapter into the migration. 18. The data processing system of claim 15, wherein the inspection device further comprises: acquisition means for obtaining the conversion control item from the conversion control item table in the system memory; and placing means for The conversion control is placed in the hold-holder in the input/round out adapter to check the migration progress bit in the conversion control. 19. The data processing system of claim 15, wherein the delay device comprises a disabling device for direct memory access write request and direct memory access from the wheel input/output adapter The read request, as well as the memory mapped round-in/output load reply failure from the on/off adapter. The data processing system of claim 15 wherein the physical page migration comprises: a first setting means for setting the migration progress bit to all conversion control items pointing to the real 38 200809514 body page, To indicate that the migration of the physical page is progressing; the device knows, and the effect is set to invalidate the cached copy of the conversion control item according to the setting of the migration progress bit; the clothing is used to issue the memory mapping Input/output is loaded into each peripheral component to interconnect the main bridge; ~2 device 'is used to copy all the contents of the physical page to the new entity page in response to the completion of all mapping input/output loading; =, used to modify the conversion control item to point to the new page; change (4)::疋,, to set each - the migration progress bit in the transfer, 'stop call ridge entity has a migration surface exhibition; and declare The device is used to determine the operation of each operation, and the memory is read directly.
TW096111878A 2006-04-17 2007-04-03 Stalling of dma operations in order to do memory migration using a migration in progress bit in the translation control entry mechanism TWI414943B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/279,906 US8621120B2 (en) 2006-04-17 2006-04-17 Stalling of DMA operations in order to do memory migration using a migration in progress bit in the translation control entry mechanism

Publications (2)

Publication Number Publication Date
TW200809514A true TW200809514A (en) 2008-02-16
TWI414943B TWI414943B (en) 2013-11-11

Family

ID=38662426

Family Applications (1)

Application Number Title Priority Date Filing Date
TW096111878A TWI414943B (en) 2006-04-17 2007-04-03 Stalling of dma operations in order to do memory migration using a migration in progress bit in the translation control entry mechanism

Country Status (4)

Country Link
US (1) US8621120B2 (en)
JP (1) JP4898525B2 (en)
CN (1) CN100495375C (en)
TW (1) TWI414943B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9547602B2 (en) 2013-03-14 2017-01-17 Nvidia Corporation Translation lookaside buffer entry systems and methods
US9569214B2 (en) 2012-12-27 2017-02-14 Nvidia Corporation Execution pipeline data forwarding
US9582280B2 (en) 2013-07-18 2017-02-28 Nvidia Corporation Branching to alternate code based on runahead determination
US9632976B2 (en) 2012-12-07 2017-04-25 Nvidia Corporation Lazy runahead operation for a microprocessor
US9645929B2 (en) 2012-09-14 2017-05-09 Nvidia Corporation Speculative permission acquisition for shared memory
US9740553B2 (en) 2012-11-14 2017-08-22 Nvidia Corporation Managing potentially invalid results during runahead
US9823931B2 (en) 2012-12-28 2017-11-21 Nvidia Corporation Queued instruction re-dispatch after runahead
US9875105B2 (en) 2012-05-03 2018-01-23 Nvidia Corporation Checkpointed buffer for re-entry from runahead
US9880846B2 (en) 2012-04-11 2018-01-30 Nvidia Corporation Improving hit rate of code translation redirection table with replacement strategy based on usage history table of evicted entries
US10001996B2 (en) 2012-10-26 2018-06-19 Nvidia Corporation Selective poisoning of data during runahead
US10146545B2 (en) 2012-03-13 2018-12-04 Nvidia Corporation Translation address cache for a microprocessor
US10241810B2 (en) 2012-05-18 2019-03-26 Nvidia Corporation Instruction-optimizing processor with branch-count table in hardware
US10324725B2 (en) 2012-12-27 2019-06-18 Nvidia Corporation Fault detection in instruction translations

Families Citing this family (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7574537B2 (en) * 2005-02-03 2009-08-11 International Business Machines Corporation Method, apparatus, and computer program product for migrating data pages by disabling selected DMA operations in a physical I/O adapter
US7500072B2 (en) * 2006-04-25 2009-03-03 International Business Machines Corporation Migrating data that is subject to access by input/output devices
TWI328167B (en) * 2007-05-04 2010-08-01 Inventec Corp Method for accessing memory
US8224885B1 (en) 2009-01-26 2012-07-17 Teradici Corporation Method and system for remote computing session management
US8918488B2 (en) 2009-02-04 2014-12-23 Citrix Systems, Inc. Methods and systems for automated management of virtual resources in a cloud computing environment
US8615645B2 (en) 2010-06-23 2013-12-24 International Business Machines Corporation Controlling the selectively setting of operational parameters for an adapter
US8549182B2 (en) 2010-06-23 2013-10-01 International Business Machines Corporation Store/store block instructions for communicating with adapters
US8626970B2 (en) 2010-06-23 2014-01-07 International Business Machines Corporation Controlling access by a configuration to an adapter function
US8510599B2 (en) 2010-06-23 2013-08-13 International Business Machines Corporation Managing processing associated with hardware events
US8504754B2 (en) 2010-06-23 2013-08-06 International Business Machines Corporation Identification of types of sources of adapter interruptions
US8650337B2 (en) 2010-06-23 2014-02-11 International Business Machines Corporation Runtime determination of translation formats for adapter functions
US9195623B2 (en) * 2010-06-23 2015-11-24 International Business Machines Corporation Multiple address spaces per adapter with address translation
US8566480B2 (en) 2010-06-23 2013-10-22 International Business Machines Corporation Load instruction for communicating with adapters
US8639858B2 (en) 2010-06-23 2014-01-28 International Business Machines Corporation Resizing address spaces concurrent to accessing the address spaces
US8635430B2 (en) 2010-06-23 2014-01-21 International Business Machines Corporation Translation of input/output addresses to memory addresses
US8621112B2 (en) 2010-06-23 2013-12-31 International Business Machines Corporation Discovery by operating system of information relating to adapter functions accessible to the operating system
US8572635B2 (en) 2010-06-23 2013-10-29 International Business Machines Corporation Converting a message signaled interruption into an I/O adapter event notification
US8650335B2 (en) 2010-06-23 2014-02-11 International Business Machines Corporation Measurement facility for adapter functions
US9342352B2 (en) 2010-06-23 2016-05-17 International Business Machines Corporation Guest access to address spaces of adapter
US8505032B2 (en) 2010-06-23 2013-08-06 International Business Machines Corporation Operating system notification of actions to be taken responsive to adapter events
US8468284B2 (en) 2010-06-23 2013-06-18 International Business Machines Corporation Converting a message signaled interruption into an I/O adapter event notification to a guest operating system
US8478922B2 (en) 2010-06-23 2013-07-02 International Business Machines Corporation Controlling a rate at which adapter interruption requests are processed
US9213661B2 (en) * 2010-06-23 2015-12-15 International Business Machines Corporation Enable/disable adapters of a computing environment
US8407389B2 (en) * 2010-07-20 2013-03-26 International Business Machines Corporation Atomic operations with page migration in PCIe
US8904115B2 (en) * 2010-09-28 2014-12-02 Texas Instruments Incorporated Cache with multiple access pipelines
US20120254582A1 (en) * 2011-03-31 2012-10-04 Ashok Raj Techniques and mechanisms for live migration of pages pinned for dma
EP2626757A1 (en) * 2012-02-08 2013-08-14 Intel Mobile Communications Technology Dresden GmbH Finite state machine for system management
CN102645609B (en) * 2012-03-30 2014-12-10 上海斐讯数据通信技术有限公司 Joint test action group (JTAG) link circuit test device and test method of JTAG chain circuit test device
US20140089553A1 (en) * 2012-09-24 2014-03-27 Broadcom Corporation Interface between a host and a peripheral device
US10108424B2 (en) 2013-03-14 2018-10-23 Nvidia Corporation Profiling code portions to generate translations
US10404795B2 (en) * 2014-02-19 2019-09-03 Vmware, Inc. Virtual machine high availability using shared storage during network isolation
US9563572B2 (en) * 2014-12-10 2017-02-07 International Business Machines Corporation Migrating buffer for direct memory access in a computer system
US10810031B2 (en) * 2015-09-28 2020-10-20 Red Hat Israel, Ltd. Dirty memory tracking with assigned devices by exitless paravirtualization
US9529759B1 (en) 2016-01-14 2016-12-27 International Business Machines Corporation Multipath I/O in a computer system
US10042720B2 (en) 2016-02-22 2018-08-07 International Business Machines Corporation Live partition mobility with I/O migration
US10002018B2 (en) 2016-02-23 2018-06-19 International Business Machines Corporation Migrating single root I/O virtualization adapter configurations in a computing system
US10042723B2 (en) 2016-02-23 2018-08-07 International Business Machines Corporation Failover of a virtual function exposed by an SR-IOV adapter
US10025584B2 (en) 2016-02-29 2018-07-17 International Business Machines Corporation Firmware management of SR-IOV adapters
US9720863B1 (en) 2016-10-21 2017-08-01 International Business Machines Corporation Migrating MMIO from a source I/O adapter of a source computing system to a destination I/O adapter of a destination computing system
US9740647B1 (en) * 2016-10-21 2017-08-22 International Business Machines Corporation Migrating DMA mappings from a source I/O adapter of a computing system to a destination I/O adapter of the computing system
US9715469B1 (en) 2016-10-21 2017-07-25 International Business Machines Corporation Migrating interrupts from a source I/O adapter of a source computing system to a destination I/O adapter of a destination computing system
US9720862B1 (en) 2016-10-21 2017-08-01 International Business Machines Corporation Migrating interrupts from a source I/O adapter of a computing system to a destination I/O adapter of the computing system
US9760512B1 (en) 2016-10-21 2017-09-12 International Business Machines Corporation Migrating DMA mappings from a source I/O adapter of a source computing system to a destination I/O adapter of a destination computing system
US9785451B1 (en) 2016-10-21 2017-10-10 International Business Machines Corporation Migrating MMIO from a source I/O adapter of a computing system to a destination I/O adapter of the computing system
CN110765462B (en) * 2018-07-28 2023-06-27 阿里巴巴集团控股有限公司 Operation control method and device, computing system and electronic equipment
DE112019007388T5 (en) 2019-05-31 2022-02-17 Micron Technology, Inc. JTAG-BASED ARCHITECTURE ENABLING MULTI-CORE OPERATION
CN110321225B (en) * 2019-07-08 2021-04-30 腾讯科技(深圳)有限公司 Load balancing method, metadata server and computer readable storage medium
CN112506828B (en) * 2020-12-18 2024-05-17 展讯半导体(成都)有限公司 Transmission configuration method and device for direct memory access
US11567666B2 (en) * 2021-03-24 2023-01-31 Ati Technologies Ulc Handling the migration of pages of memory accessible by input-output devices
US20230325321A1 (en) * 2022-04-11 2023-10-12 Samsung Electronics Co., Ltd. Systems and methods for pre-populating address translation cache
US11947801B2 (en) * 2022-07-29 2024-04-02 Intel Corporation In-place memory copy during remote data transfer in heterogeneous compute environment

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5377337A (en) * 1993-06-08 1994-12-27 International Business Machines Corporation Method and means for enabling virtual addressing control by software users over a hardware page transfer control entity
US6457072B1 (en) * 1999-07-29 2002-09-24 Sony Corporation System and method for effectively performing physical direct memory access operations
GB2359906B (en) * 2000-02-29 2004-10-20 Virata Ltd Method and apparatus for DMA data transfer
US6785759B1 (en) * 2000-05-10 2004-08-31 International Business Machines Corporation System and method for sharing I/O address translation caching across multiple host bridges
US6654818B1 (en) * 2000-06-22 2003-11-25 International Business Machines Corporation DMA access authorization for 64-bit I/O adapters on PCI bus
US6678755B1 (en) * 2000-06-30 2004-01-13 Micron Technology, Inc. Method and apparatus for appending memory commands during a direct memory access operation
US6816921B2 (en) * 2000-09-08 2004-11-09 Texas Instruments Incorporated Micro-controller direct memory access (DMA) operation with adjustable word size transfers and address alignment/incrementing
US6931471B2 (en) * 2002-04-04 2005-08-16 International Business Machines Corporation Method, apparatus, and computer program product for migrating data subject to access by input/output devices
US6941436B2 (en) * 2002-05-09 2005-09-06 International Business Machines Corporation Method and apparatus for managing memory blocks in a logical partitioned data processing system
US7103728B2 (en) * 2002-07-23 2006-09-05 Hewlett-Packard Development Company, L.P. System and method for memory migration in distributed-memory multi-processor systems
US6804729B2 (en) * 2002-09-30 2004-10-12 International Business Machines Corporation Migrating a memory page by modifying a page migration state of a state machine associated with a DMA mapper based on a state notification from an operating system kernel
US7117385B2 (en) * 2003-04-21 2006-10-03 International Business Machines Corporation Method and apparatus for recovery of partitions in a logical partitioned data processing system
JP2005122640A (en) 2003-10-20 2005-05-12 Hitachi Ltd Server system and method for sharing i/o slot
US7146482B2 (en) * 2003-11-25 2006-12-05 International Business Machines Corporation Memory mapped input/output emulation

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10146545B2 (en) 2012-03-13 2018-12-04 Nvidia Corporation Translation address cache for a microprocessor
US9880846B2 (en) 2012-04-11 2018-01-30 Nvidia Corporation Improving hit rate of code translation redirection table with replacement strategy based on usage history table of evicted entries
US9875105B2 (en) 2012-05-03 2018-01-23 Nvidia Corporation Checkpointed buffer for re-entry from runahead
US10241810B2 (en) 2012-05-18 2019-03-26 Nvidia Corporation Instruction-optimizing processor with branch-count table in hardware
US9645929B2 (en) 2012-09-14 2017-05-09 Nvidia Corporation Speculative permission acquisition for shared memory
US10628160B2 (en) 2012-10-26 2020-04-21 Nvidia Corporation Selective poisoning of data during runahead
US10001996B2 (en) 2012-10-26 2018-06-19 Nvidia Corporation Selective poisoning of data during runahead
US9740553B2 (en) 2012-11-14 2017-08-22 Nvidia Corporation Managing potentially invalid results during runahead
US9632976B2 (en) 2012-12-07 2017-04-25 Nvidia Corporation Lazy runahead operation for a microprocessor
US9891972B2 (en) 2012-12-07 2018-02-13 Nvidia Corporation Lazy runahead operation for a microprocessor
US10324725B2 (en) 2012-12-27 2019-06-18 Nvidia Corporation Fault detection in instruction translations
US9569214B2 (en) 2012-12-27 2017-02-14 Nvidia Corporation Execution pipeline data forwarding
US9823931B2 (en) 2012-12-28 2017-11-21 Nvidia Corporation Queued instruction re-dispatch after runahead
US9547602B2 (en) 2013-03-14 2017-01-17 Nvidia Corporation Translation lookaside buffer entry systems and methods
US9804854B2 (en) 2013-07-18 2017-10-31 Nvidia Corporation Branching to alternate code based on runahead determination
US9582280B2 (en) 2013-07-18 2017-02-28 Nvidia Corporation Branching to alternate code based on runahead determination

Also Published As

Publication number Publication date
JP4898525B2 (en) 2012-03-14
TWI414943B (en) 2013-11-11
CN101059786A (en) 2007-10-24
CN100495375C (en) 2009-06-03
US20070260768A1 (en) 2007-11-08
US8621120B2 (en) 2013-12-31
JP2007287140A (en) 2007-11-01

Similar Documents

Publication Publication Date Title
TW200809514A (en) Stalling of DMA operations in order to do memory migration using a migration in progress bit in the translation control entry mechanism
US7734843B2 (en) Computer-implemented method, apparatus, and computer program product for stalling DMA operations during memory migration
JP5963282B2 (en) Interrupt distribution scheme
JP5911985B2 (en) Providing hardware support for virtual memory shared between local and remote physical memory
US9798556B2 (en) Method, system, and apparatus for dynamic reconfiguration of resources
TWI254861B (en) Data processing system, method, and computer readable medium for sharing input/output facilities of a logical partition with another logical partition
TWI278755B (en) An apparatus and method for high performance volatile disk drive memory access using an integrated DMA engine
TWI336836B (en) System and method for extending the cross-memory descriptor to describe another partition&#39;s memory
US9423959B2 (en) Method and apparatus for store durability and ordering in a persistent memory architecture
US5682512A (en) Use of deferred bus access for address translation in a shared memory clustered computer system
JP3275051B2 (en) Method and apparatus for maintaining transaction order and supporting delayed response in a bus bridge
US7117338B2 (en) Virtual memory address translation control by TLB purge monitoring
US6898646B1 (en) Highly concurrent DMA controller with programmable DMA channels
US11995351B2 (en) DMA engines configured to perform first portion data transfer commands with a first DMA engine and second portion data transfer commands with second DMA engine
US7552319B2 (en) Methods and apparatus to manage memory access
US8346975B2 (en) Serialized access to an I/O adapter through atomic operation
JP4965974B2 (en) Semiconductor integrated circuit device
JP2008123333A5 (en)
JP2010140130A (en) Semiconductor device
JPH065521B2 (en) Message buffer system
JPH0520259A (en) Internal bus control system for processor module