JP3663393B2

JP3663393B2 - Method, processor unit and computer system for checkpointing a multi-processor data processing system

Info

Publication number: JP3663393B2
Application number: JP2002185571A
Authority: JP
Inventors: ドクトル・ハリー・シュテファン・バロフスキ; ドクトル・ハルトムート・シュヴェルマー; ハンス＝ヴェルナー・タスト
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2001-06-27
Filing date: 2002-06-26
Publication date: 2005-06-22
Anticipated expiration: 2022-06-26
Also published as: US6968476B2; US20030005265A1; JP2003067184A

Description

【０００１】
【発明の属する技術分野】
本発明は、エラーリカバリを提供するための、複数プロセッサ・データ処理システムのチェックポインティング（checkpointing）の方法およびシステムに関する。
【０００２】
【従来の技術】
現代のプロセッサで高い命令レベルの並列性を可能にするために、複数の命令を、並列に実行し、最終的にリタイヤ（retire）させることができる。これは、ＣＩＳＣプロセッサの複雑な命令が複数のより単純なＲＩＳＣ様命令に変換される場合と、１サイクル毎に実行される命令の数（ＩＰＣ）を高くしなければならない場合に必須である。これらの命令のリタイヤは、設計済みレジスタ・アレイ（architected register array）の内容が、内部命令の結果によって更新され、対応するストア・データが、キャッシュ／メモリにライト・バックされることを意味する。プログラムによって与えられる命令シーケンスを反映するために、リタイヤすなわち命令の完了は、概念上の順序で発生する。したがって、用語「若い」命令および「古い」命令は、それぞれ、命令シーケンス内で後にまたは早期に見つかる命令を表す。チェックポインティングは、設計済みレジスタおよびデータ・キャッシュに保管された対応するデータの状態のスナップショットが、ある頻度すなわち、固定された時間間隔でとられることを意味する。スナップショットが、すべてのサイクルでとられる場合に、最高の分解能が得られる。そのような従来技術のチェックポインティング方法が、米国特許第５４１８９１６号に開示されている。チェックポイント再試行機能は、通常動作中にストア・キューを確立し、再試行動作中にチェックポイント再試行に必要なデータを供給するために、ストア・バッファを使用する。そこにバッファリングされるデータには、浮動小数点レジスタ、汎用レジスタ、およびアクセス・レジスタのレジスタ・データと、プログラム状況ワードも含まれる。
【０００３】
これは、基本的に、処理ユニットのそれぞれのＬ１キャッシュに関連する複数のストア・バッファの助けを得て行われる。ストア・バッファのそれぞれは、ストレージ・データを、他のＣＰＵがそのデータにアクセスできるストレージ階層内の他の部分に解放できるようになるまで、そのストレージ・データを保持するための中間バッファとして使用される。
【０００４】
ストレージ・データの解放を制御するために、２つの情報ビットすなわち、「命令の終り（end of instruction）」（ＥＯＩ）ビットと「チェックポイント完了（checkpoint complete）」（ＣＯＭＰ）ビットが、ストア・キュー設計に装備される。ストア・バッファ内のデータは、それに直接に関連するプロセッサだけが使用可能である。他のプロセッサは、このデータが、すべての他のプロセッサの共用であるＬ２キャッシュまたはメモリに書き込まれるまで、このデータにアクセスすることができない。しかし、この従来技術の手法は、１サイクル毎に複数の外部命令（以下、「ＣＩＳＣ命令」または「外部ＣＩＳＣ命令」とも称する）をチェックポインティングすることが必要な時に、弱みを有する。すなわち、１サイクル毎に多くとも１つの外部命令しかチェックポインティングすることができない。
【０００５】
【発明が解決しようとする課題】
したがって、本発明の目的は、１サイクル毎に複数の外部命令をチェックポインティングすることができる、スーパースカラ・システムをチェックポインティングする改良された方法およびシステムを提供することである。
【０００６】
【課題を解決するための手段】
プロセッサが、１サイクル毎にある（最大の）個数の内部命令（以下、「ＲＩＳＣ命令」または「ＲＩＳＣ様命令」とも称する）をリタイヤでき、外部ＣＩＳＣ命令を表す内部命令の数が固定されておらず、たとえば動作コードに依存する場合に、プロセッサの状態のチェックポインティングが、複数の外部命令に基づく可能性がある。
【０００７】
簡単な例を図１に示す。外部命令ＩＤ４０（以下、「命令識別子」または単に「ＩＤ」とも称する）によって一意に識別されるプロセッサのＣＩＳＣ命令４２を、設計済みレジスタに作用する１つから４つまでの内部命令４４と、キャッシュ／メモリからのデータのフェッチ（fetch）およびストアを扱う１つから４つまでのロード／ストア命令４６に変換できると仮定する。
【０００８】
最大４つの内部命令を同時にリタイヤできるという仮定は、プロセッサの状態のスナップショットをすべてのサイクルにとる場合に、４つまでの外部ＣＩＳＣ命令をチェックポインティングしなければならないことを意味する。
【０００９】
プロセッサの設計済みレジスタのチェックポインティングは、レジスタ内容を、すべてのレジスタがマスタ・コピーを有するチェックポインティング・アレイにコピーすることによって行うことができる。ストア・データのチェックポインティングは、ストア・データが低位（たとえばＬ１）キャッシュに最初にライト・バックされる可能性があるが、チェックポイントの完了時に高位キャッシュ・メモリ（たとえばＬ２）内で解放される、メモリ階層に基づく可能性がある。レジスタ・ベースの内部命令および対応するストア命令を、外部命令に関係させることができることを保証するために、これらの命令に、一意の命令識別子番号（ＩＤ）を用いてタグを付けなければならない。
【００１０】
添付の独立請求項に記載された特徴によって、本発明の上述の目的が達成される。本発明のさらなる有利な配置および実施形態は、それぞれの従属項に記載されている。追加された請求項を参照してはならない。
【００１１】
最も広義の態様によれば、エラー・リカバリを達成するために、単一プロセッサまたは複数プロセッサのデータ処理システムをチェックポインティングする方法が提供される。この方法には、
ａ）チェックポイント状態バッファ内に、それぞれの複数のＣＩＳＣ／ＲＩＳＣ命令によって実行される、レジスタ内容の所定の最大個数の更新（たとえば、最大４個のレジスタ更新）を収集するステップであって、チェックポイント状態が、
前記複数の（ＣＩＳＣから導出された）命令によって更新され得るレジスタと同数のバッファリング・スロットと、
前記複数のＣＩＳＣ命令のうち最も若いＣＩＳＣ命令に関連するプログラム・カウンタ値の項目と
を含む、ステップと、
ｂ）現在収集されているレジスタ・データ内でエラーが検出されなかったことを判定した後で、前記最も若いＣＩＳＣ命令の完了の前または完了と共に、前記レジスタ・データを用いて設計済みレジスタ・アレイ（ＡＲＡ）を更新するステップと
が含まれる。
【００１２】
したがって、この利点は、それぞれがレジスタに対して動作する複数の外部命令を含む命令シーケンスを、１サイクル毎にチェックポインティングできることをもたらす。
【００１３】
したがって、本発明は、１サイクル毎に１つまたは複数の外部ＣＩＳＣ命令によって実行されるレジスタ内容の更新が、前記チェックポイント状態を形成することによって収集されるという発想に基づく。チェックポイント状態は、１サイクル毎に１つまたは複数のＣＩＳＣ命令によって更新され得るレジスタと同数のスロットからなることが好ましい。さらに、すべての命令によって、プロセッサの状況、たとえばプログラム・カウンタが更新される。複数の命令のチェックポインティングのためには、最後の状況だけが重要であり、たとえば、複数の外部ＣＩＳＣ命令が同時に完了する場合には、外部命令のシーケンス内で最後に完了した完了によって、プログラム・カウンタが決定される。
【００１４】
エラーがプログラム実行時に検出されない場合には、このチェックポイント状態が、チェックポインティング・アレイの更新を実行するのに（最終的に）使用される。この更新は、チェックポイント状態が作成されてから数サイクル後に行われる可能性がある。チェックポイント状態が、チェックポインティング・アレイ、たとえば上のＡＲＡの更新に最終的に使用されるまで、複数のチェックポイント状態を、チェックポイント状態バッファ（ＣＳＢ）に収集することができ、ＣＳＢには、すべてのサイクルに新しいチェックポイント状態が収集される。プロセッサ内のエラーが検出される場合には、チェックポインティング・アレイ更新機構が、即座にブロックされ、したがって、破壊されたデータがチェックポインティング・アレイを汚染しなくなる。
【００１５】
さらに、この発明的方法に、ＡＲＡ項目にエラー検出および訂正（ＥＣＣ）ビットを設けるステップが含まれる時に、効率的で面積を節約するエラー訂正機構が、ビット障害に対して設けられる。
【００１６】
この発明的方法に、さらに、
ａ）複数のＳＴＯＲＥ命令からの結果のＳＴＯＲＥデータの、ストア・バッファ（ＳＴＢ）から設計済み状態キャッシュ・メモリへの解放を制御する第２制御パスを、前記ＡＲＡ更新と並列に提供するステップと、
ｂ）前記最も若いＣＩＳＣ命令のＩＤを用いて前記チェックポイント状態バッファ項目にタグを付けることによって、前記ＳＴＯＲＥデータ解放を前記ＡＲＡ更新と同期化するステップと、
ｃ）前記最も若いＣＩＳＣ命令のＩＤより古いかまたはそれと等しいＩＤを有するデータだけを設計済み状態キャッシュ・メモリに解放するステップと
が含まれる。この場合には、上述のシーケンスに１つまたは複数のＳＴＯＲＥ命令も含めることができるという長所がもたらされる。したがって、レジスタ操作命令とキャッシュ操作命令の混合されたシーケンスを、１サイクル毎にチェックポインティングすることができる。したがって、この発明的概念は、レジスタ更新命令だけに焦点を合わせるように制限はされない。
【００１７】
チェックポイント状態更新とキャッシュ／メモリへのデータのストアの間の同期化に関する基本的な発想は、すべてのチェックポイント状態が、チェックポインティングされるシーケンス内の最後の外部命令のＩＤを用いてタグを付けられるということである。ストア・データは、対応する命令ＩＤのＩＤを用いてタグを付けられる。すべてのＳＴＯＲＥデータは、キャッシュ／メモリに解放されるまで前記ストア・バッファに保持される。同期化が得られるのは、チェックポインティング・アレイの更新に使用された最後のチェックポイント状態のＩＤと比較してより古いか等しいＩＤを所有するＳＴＯＲＥデータだけが、システム・メモリ（すなわちＬ２キャッシュ）に解放される場合である。まだチェックポインティング・アレイにチェックポインティングされていない命令に対応するストア・データは、対応するチェックポイント状態がチェックポインティング・アレイの更新に使用されるまで、ストア・バッファに保持される。したがって、チェックポインティング・アレイの内容およびシステム・メモリに保管されたデータが、いつでも一貫性を有することが保証される。プロセッサの内部でエラーが発生する場合に、破壊されたデータは、システム・メモリに入っていない。リカバリで、チェックポインティング・アレイを使用し、プロセッサ状態、たとえばプログラム・カウンタを復元することによって、設計済みレジスタが復元される場合に、そのプロセッサは、システム・メモリ内のデータに損傷を与えずに、プログラム実行を再始動することができる。
【００１８】
さらに、上の同期化ステップに、ＡＲＡ更新制御とＳＴＯＲＥデータ解放制御の間のダブル・ハンドシェーク動作が含まれる。このダブル・ハンドシェーク動作は、
ａ）少なくとも前記最も若いＣＩＳＣ命令に関連するそれぞれのＳＴＯＲＥデータが前記ストア・キューに常駐する時に、前記最も若いＣＩＳＣ命令のＩＤを前記ＡＲＡ更新制御にシグナリングする第１ステップであって、これによって、前記シグナリングされる最も若いＣＩＳＣ命令のＩＤと比較してより古いＩＤを有するレジスタ命令を含むＡＲＡ更新がトリガされる、第１ステップと、
ｂ）最も新しいＡＲＡ更新に関連する前記最も若いＣＩＳＣ命令のＩＤをＳＴＯＲＥデータ解放制御にシグナリングして、前記ストア・バッファから前記設計済み状態キャッシュ・メモリへのＳＴＯＲＥデータ解放をトリガする第２ステップであって、前記解放が、前記シグナリングされる最も若いＣＩＳＣ命令のＩＤと比較してより古いＩＤを有する命令から生じるＳＴＯＲＥデータを含む、第２ステップと
が含まれる。この場合には、同期化の好ましい方法が提供される。というのは、これによって、効率的で一貫性のあるチェックポインティング・システムが提供されるからである。さらなる詳細を、下で図６に関して示す。
【００１９】
この発明的概念を、チェックポイント状態バッファ内に、それぞれの複雑なＣＩＳＣ命令によって（たとえば、１６個までのレジスタを更新するＬＯＡＤＭＵＬＴＩＰＬＥ命令として）実行される、所定の拡張された最大個数のレジスタ内容の更新（たとえば、最大１６個のレジスタ更新）を収集するように有利に拡張することが好ましい。この拡張は、
ａ）前記レジスタ更新データを受け取るために、それぞれの拡張された複数のチェックポイント状態バッファ項目（たとえば、１６／４＝４項目）を予約するステップと、
ｂ）グルー・ビットを用いて、１つの同一のＣＩＳＣ命令に関連する後続項目をマークするステップと、
ｃ）複数のサイクルのアトミック・オペレーションで、このように拡張されたチェックポイント状態を更新するステップと
を提供することによって行われる。
【００２０】
この特徴を用いると、４つを超える内部命令に変換される必要がある非常に複雑な外部ＣＩＳＣ命令を、アトミックな形すなわち、成功裡に完了されるか全く開始されない形であるが、たとえばチェックポイント・サイクル中の電源障害などの動作障害がないと仮定して、複数のサイクル以内に、チェックポインティングすることができる。
【００２１】
上で述べた変形形態の１つまたは複数のステップを実行する論理回路手段を有するプロセッサ・ユニットを提供し、所定の最大個数のレジスタ内容の更新を収集する手段が、それぞれが命令ＩＤ、ターゲット・レジスタ・アドレス、ターゲット・レジスタ・データ、およびプログラム・カウンタを含むことが好ましい複数の項目を有するチェックポイント状態バッファであるようにすることが好ましい。論理チェックポイント状態に、複数の、好ましくは４つの、そのような項目が含まれる（図２から４を参照されたい）。
【００２２】
この形で、各内部命令によって、基本的に１つのレジスタを更新することができ（これはしばしば発生する）、バッファは、４つまでの内部命令を受け取るのに十分な大きさである。このＣＢＳバッファ編成は、面積消費と、向上したランタイム安定性によって暗示される性能利得との間のよい妥協であることがわかった。
【００２３】
したがって、当業者は、本発明が、複数の外部ＣＩＳＣ命令をチェックポインティングする新しい方式を提案し、設計済みレジスタ内容とキャッシュ／システム・メイン・メモリに保管されたデータとの間の一貫性を保証することを諒解するであろう。
【００２４】
本発明を、例によって示すが、本発明は、添付図面の図の形状によって制限されない。
【００２５】
【発明の実施の形態】
全般的に図面を参照し、ここでは特に図２を参照すると、発明的概念を例示するために、例示的プロセッサ・アーキテクチャが選択されている。これは、レジスタ・データが、キャッシュにストアまたはロードされるデータと別に操作されるプロセッサ・アーキテクチャである。レジスタ・データ「ストリーム」が、命令ウィンドウ・バッファ（Instruction Window Buffer、ＩＷＢ）と称するプロセッサ部分で操作され、ストアまたはロードのデータ「ストリーム」が、異なる部分すなわちストレージ・ウィンドウ・バッファ（Storage Window Buffer、ＳＷＢ）で操作される。チェックポインティングを考える時には、両方のデータ・ストリームが、一貫性を有するようにしなければならない。しかし、この発明的概念は、チェックポインティング制御が他のシステム・メモリ・データとの一貫性を有するレジスタ・データを保持することが必要である限り、おそらくデータ分離の他の判断基準に従う異なる方式を実施する他のすべてのタイプのプロセッサ・アーキテクチャを包含することを理解されたい。
【００２６】
次に、そのようなＩＷＢ／ＳＷＢアーキテクチャに有利なチェックポイント状態定義を、詳細に説明する。
【００２７】
チェックポイント状態には、１サイクル毎に更新され得るレジスタと同数のスロットが含まれる。したがって、１例では、１つのＣＩＳＣ命令が、４つのレジスタのうちの複数を更新することができる。したがって、チェックポイント状態に、４つのスロットすなわち、スロット０、スロット１、スロット２、およびスロット３と、プログラム・カウンタを保管するための追加のスロットが含まれる。４つのターゲット・スロットのそれぞれに、レジスタ・アドレス１０およびそれぞれのレジスタ・データ１２が含まれる。
【００２８】
この例示的なチェックポイント状態定義を用いて、４つまでの外部ＣＩＳＣ命令を表す４つの内部命令を、１サイクル毎にリタイヤすることができる。これは、すべてのサイクルに４つまでのレジスタが更新されることを意味する。さらに、すべての命令によって、プロセッサの状態、たとえばスロットに保管されるプログラム・カウンタ１４が更新される。
【００２９】
本発明による複数の命令のチェックポインティングのためには、最終的な状況だけが重要である、すなわち、複数の外部ＣＩＳＣ命令が同時に完了される場合に、プログラム・カウンタは、外部命令のシーケンス内で最後に完了された命令によって決定される。
【００３０】
次に、２つの例を示し、それぞれ図３および４に関して説明する。
【００３１】
あるＣＩＳＣ命令（図３の上側の第１列を参照されたい）が、上側の表の４行に対応する４つのＲＩＳＣ様命令に変換されるが、このＣＩＳＣ命令では、第１および第３のＲＩＳＣ命令によって２つのターゲット・レジスタに書き込む。したがって、これが、チェックポイント状態の２スロットを占め、下側の部分で、左端の２つのスロットが、図１に示されたように構成される。プログラム・カウンタ１４によって、次の順次命令の命令アドレスが決定される。チェックポイント状態の２つのスロット（すなわち左端の２スロット）が、使用され、有効ビットによってマークされる。
【００３２】
図４に示されたもう１つの例では、３つのＣＩＳＣ命令ＣＩＳＣ＃０、…、ＣＩＳＣ＃２が、４つのＲＩＳＣ様命令に変換されると仮定する。具体的に言うと、最初のＣＩＳＣ＃０命令が、図４の上部の上の２つの行に対応する２つの命令に変換され、それに続くＣＩＳＣ＃１命令およびＣＩＳＣ＃２命令が、それぞれが表の１つの行だけを占める単一のＲＩＳＣ様命令によって表される。このシーケンスでは、４つのレジスタが更新され、したがって、対応するチェックポイント状態は、４つすなわち使用可能なすべてのスロットを使用する。このチェックポイントの状況情報は、このチェックポインティングされたＣＩＳＣ命令のシーケンスの最後の命令から導出される、すなわち、プログラム・カウンタが、第３の命令であるＣＩＳＣ＃２から抽出される。
【００３３】
ここで図５および図６を参照すると、追加の同期化機構（ダブル・ハンドシェーク）特徴が開示されており、これは、たとえば命令オペランドなどのレジスタ・データおよびたとえばメモリに保管される変数などの上述のＳＴＯＲＥデータが、１つの同一のＣＩＳＣ命令によって使用される、マイクロプロセッサ・アーキテクチャに有利に適用することができる。
【００３４】
基本的に、すべてのチェックポイント状態（基本については図２を参照されたい）が、チェックポインティングされたシーケンスの最後の外部命令の命令ＩＤ４０を用いて有利にタグを付けられる。これを、図５および図６に示す。
【００３５】
図６では、右側のＳＷＢ部分に、すべてのＳＴＯＲＥデータが含まれる。この例示的なＳＴＯＲＥデータ処理アーキテクチャでは、たとえばプログラム変数に関連するデータなどのデータが、ストア・キュー（ＳＴＱ）６２から来る。ＳＴＯＲＥデータは、ストアスルー（またはライトスルー）Ｌ１キャッシュ６１ａおよびＥＣＣ生成６１ｂに送られ、ＥＣＣ生成６１ｂは、たとえば４ワードをカバーし、新しいＳＴＯＲＥデータを含む完全なＬ１キャッシュ・ラインに有利に対応し、ＳＴＯＲＥデータは、その後、命令実行の後にＥＣＣ処理される。その後、それぞれのＥＣＣ生成６１ｂが、ストア・バッファ６５内でエラーなしでバッファリングされ、ストア・バッファ６５は、やはりＬ１キャッシュ部分に配置できることが好ましいが、代替案ではＬ１キャッシュ部分の近くに配置される。
【００３６】
その後、前記データが、Ｌ２キャッシュ６６またはメモリ階層の他の適合された部分に、レジスタ・データの解放と同期化された形で解放される。前記ＳＴＯＲＥデータ処理は、ＥＣＣ処理中に最終的に訂正される可能性があるデータが、オフ・チップで、たとえばＬ２キャッシュ内に配置されるデータのＥＣＣ処理と比較してより高速にオンチップで完全に処理されるので、好ましい。しかし、左側のＩＷＢ部分に、チェックポイント状態バッファ（ＣＳＢ）６０および基本的にチェックポインティング・アレイ６４が含まれ、チェックポインティング・アレイ６４には、エラー検出されエラー訂正された設計済みレジスタ・データ（error-checked and error-corrected architected Register）が含まれる。したがって、これを、（ＥＣＣ−ＡＲＡ）と略す。
【００３７】
一般に、チェックポインティング・アレイ６４（ＥＣＣ−ＡＲＡ）へのチェックポインティングは、アトミック手順で行われる。「アトミック」は、１つまたは複数のプロセッサの信頼性のあるリセット点を保証するために、チェックポイント手順が、完全に行われるすなわち成功裡に完了すると期待されなければならず、さもなければ、開始されることすら許可されないことを意味する。
【００３８】
一般に、チェックポイントが完了されない限り、チェックポインティング・アレイ６４のＥＣＣ−ＡＲＡへの読取アクセスもＥＣＣ−ＡＲＡに関連するリセット機能も不可能である。これによって、完了した外部命令だけがＥＣＣ−ＡＲＡにチェックポインティングされることが保証される。
【００３９】
具体的に言うと、チェックポインティング・アレイ６４の更新と対応する「設計済み」メモリ部分、Ｌ２キャッシュ６６の更新の間の好ましい同期化（ダブル・ハンドシェーク）方式は、次の通りである。チェックポインティング・アレイ６４に保管されたレジスタ内容をＬ２キャッシュ６６に保管されたデータと同期化するために、ＳＴＱ６２とＣＳＢ６０の間で第１の同期化が確立され、チェックポインティング・アレイ６４（の制御ロジック）とストア・バッファ６５の間でもう１つの第２の同期化が確立される。
【００４０】
空のＳＴＱ６２を仮定すると、チェックポインティングは、完全なチェックポイント状態がＣＳＢ６０内で使用可能である場合に、必ず発生する。したがって、すべてのデータ、具体的には図３および図４に関して上で説明したレジスタ・アドレス１０およびレジスタ・データ１２が、存在しなければならない。
【００４１】
ＳＴＯＲＥデータが、ＳＴＱ６２内で見つかる場合に、命令識別子ＩＤが、ＣＳＢ６０にシグナルされ（第１ハンドシェーク信号の矢印６８を参照されたい）、そのチェックポイント状態を、チェックポインティング・アレイ６４（ＥＣＣ−ＡＲＡ）にチェックポインティングできるようになる。したがって、これは、これがそれに応じてシグナルされる時に読取についてＣＳＢ６０にアクセスし、読取／書込アクセスについてチェックポインティング・アレイ６４にアクセスするように配置された読取ポート６３を含む制御ロジックを介して行われることが好ましい。
【００４２】
チェックポイント状態が、チェックポインティング・アレイ６４に完全にチェックポインティングされた時に、対応するＩＤが、ＣＳＢ６０から、または好ましくは読取ポート６３を介してチェックポインティング・アレイ６４から読み出され、ＥＣＣ保護されたＳＯＴＲＥデータを含むストア・バッファ６５に送られる（第２ハンドシェーク信号の矢印６９を参照されたい）。ストア・バッファ６５のそれぞれの項目に保管されたＳＴＯＲＥデータは、それぞれのＩＤがチェックポインティング・アレイ６４から受け取られた場合に限ってＬ２キャッシュに解放される。したがって、これは、ＩＤを受け取った後に行われる。当業者が諒解できるように（これらの規則に従う時に）Ｌ２キャッシュ６６のメモリ内容は、必ず、チェックポインティング・アレイ６４で見つかるレジスタ・データとの一貫性を有する。同一の命令に関連するデータは、前記記憶手段すなわちＣＳＢ６０、チェックポインティング・アレイ６４、ＳＴＱ６２、ストア・バッファ６５、およびＬ２キャッシュ６６のそれぞれで同一のＩＤを有する。
【００４３】
言い換えると、同期化は、基本的に、チェックポインティング・アレイ６４の更新に使用された最後のチェックポイント状態のＩＤと比較してより古いか等しいＩＤを有する特定のＳＴＯＲＥデータが、システム・メモリに解放される場合に限って得られる。まだチェックポインティング・アレイにチェックポインティングされていない命令に対応するストア・データは、対応するチェックポイント状態がチェックポインティング・アレイ６４の更新に使用されるまで保持される。したがって、チェックポインティング・アレイの内容と、Ｌ２キャッシュまたはメモリに保管されたデータが、いつでも一貫性を有することが保証される。
【００４４】
プロセッサの内部でエラーが発生する場合に、破壊されたデータは、Ｌ２キャッシュまたはメモリに入っていない。リカバリで、チェックポインティング・アレイ６４を使用し、プロセッサ状態、たとえばプログラム・カウンタを復元することによって設計済みレジスタが復元される場合に、プロセッサは、システム・メモリ内のデータに損傷を与えずにプログラム実行を再始動することができ、Ｌ１キャッシュの内容をパージしなければならない。最後にチェックポインティングされた命令のＩＤに関してより古いすべての項目も、ストア・バッファ６５内で消去されなければならないことに留意されたい。
【００４５】
ＣＳＢ６０が満杯の時に、命令コミッタが停止され、したがって、新しい命令がコミットされないことを追加しなければならない。これは、ＳＴＱ６２がストア・バッファ６５にデータをストアするまでのプロセッサの停止につながる。
【００４６】
さらに、チェックポインティング・アレイ６４の更新は、チェックポイント状態が作成されてから数サイクル後に発生する可能性がある。この時間の間に、チェックポイント状態がチェックポインティング・アレイ６４の一貫性のある更新に最終的に使用されるまで、複数のチェックポイント状態（好ましくは１サイクル毎に１つ）を、前記チェックポイント状態バッファ（ＣＳＢ）に収集することができる。別に設けられ本発明の対象ではないなんらかの従来技術のエラー検出ロジックによるエラーの認識の際に、ＣＳＢ６０に保管された次のチェックポイント状態のチェックポインティングが、即座にブロックされる。したがって、エラーが、チェックポインティング・アレイ６４に影響せず、Ｌ２キャッシュに保管された「正しい」データを破壊しないことが保証される。
【００４７】
さらに、図７を参照して、４つを超える内部命令からなる非常に複雑な外部命令のためのチェックポイント状態の拡張を、さらに開示する。
【００４８】
非常に複雑な外部（ＣＩＳＣ）命令を、本発明の開示の利益を得ながら４つを超える内部命令に変換する必要がある場合には、基本的な技術的発明的特徴を放棄せず、それぞれの増加した数の内部命令を受け取るためにはるかに広く、したがってより面積を消費するＣＳＢ６０を設けることに制限されずに、発明的方式を拡張することができる。
【００４９】
この前提の下で、複雑な命令のリタイヤは、最大４つの内部命令だけを１サイクル毎にリタイヤできる場合に、複数サイクルにわたって続く。これは、すべてのサイクルに作成されるチェックポイント状態が、１つの完全な外部命令を表さず、したがって、これをチェックポインティング・アレイの更新に使用してはならないことを意味する。
【００５０】
この問題に対する解決策は、複数のチェックポイント状態が、複雑な外部命令全体を表す単一の「拡張チェックポイント状態（extended checkpoint state）」を形成するとみなすという発想に基づく。
【００５１】
好ましい実施形態によれば、これらの不完全なチェックポイント状態が、ＣＳＢ６０の複数の項目を占める。前記複数の項目が、シーケンスを形成することが好ましい。ストア・バッファ６５の各項目内のある位置に設けられるグルー／リンク・ビット７２によって、ストア・バッファ６５に保管された連続するチェックポイント状態が拡張チェックポイント状態を形成することをマークする（図７を参照されたい）。したがって、ストア・バッファ６５が、少なくとも、すべての可能な完全なＣＩＳＣ命令の拡張チェックポイント状態を表すのに必要なものと同数の項目を所有することが必要である。
【００５２】
さらに、チェックポインティング・アレイ６４の更新（図６を参照されたい）は、アトミック・オペレーションが完全に更新されるまで割り込まれてはならない。
【００５３】
エラーの検出時には、アトミック・オペレーションが完全にチェックポインティングされるまで、更新機構をブロックしてはならない。これは、拡張チェックポイント状態のチェックポインティング・アレイ６４へのチェックポインティングが、複数サイクルにわたって続く場合があることも意味する。
【００５４】
前述の明細書で、本発明を、その特定の例示的実施形態に関して説明した。しかし、請求項に記載された本発明の広義の趣旨および範囲から逸脱せずに、さまざまな修正および変更を行えることは明白である。したがって、本明細書および図面は、制限的な意味ではなく、例示的であるものとみなされなければならない。
【００５５】
当業者は、本発明が、たとえば同時にリタイヤされる複数のＣＩＳＣ命令に関するチェックポインティング・アレイ６４などのチェックポインティング・バッファ手段の内容を更新し、プロセッサ・レジスタ内容およびシステム・メモリ内のデータの最も正確なチェックポイントを達成するために、たとえばＬ２キャッシュ６６などのキャッシュ階層へのＳＴＯＲＥデータの解放とチェックポインティング・バッファ手段の更新を同期化する、新しく有利な方式を提案することを諒解するであろう。チェックポインティングの正確な方法によって、メモリに保管されたデータを破壊せずに、プロセッサ内のソフト・エラーまたはハード・エラーの検出時のプロセッサのリカバリが可能になる。これは、高められたレベルのコンピューティング要件を満足するのに重要とみなされる。
【図面の簡単な説明】
【図１】発明的命令ＩＤを導入する、ＣＩＳＣ命令から複数のＲＩＳＣ命令への変換の概念を示す概略図である。
【図２】本発明によるチェックポイント状態定義の概略表現を示す図である。
【図３】２つのレジスタ更新を有する２つの内部命令に変換されるＣＩＳＣ命令を含むチェックポイント状態の概略表現を示す図である。
【図４】３つのＣＩＳＣ命令が４つの内部命令と２＋１＋１レジスタ更新に変換される、図２による表現を示す図である。
【図５】最後のチェックポインティングされたＣＩＳＣ命令のＩＤを用いてタグを付けられたチェックポイント状態の概略表現を示す図である。
【図６】本発明による、レジスタ・チェックポインティングとＳＴＯＲＥデータ処理の同期化を示す概略表現図である。
【図７】本発明による拡張されたチェックポイント状態の原理を示す概略表現図である。
【符号の説明】
１０レジスタ・アドレス
１２レジスタ・データ
１４プログラム・カウンタ
４０命令ＩＤ
４２ＣＩＳＣ命令
４４内部命令
４６ロード／ストア命令
６０チェックポイント状態バッファ（ＣＳＢ）
６１ａＬ１キャッシュ
６１ｂＥＣＣ生成
６２ストア・キュー（ＳＴＱ）
６３読取ポート
６４チェックポインティング・アレイ
６５ストア・バッファ
６６Ｌ２キャッシュ
７２グルー／リンク・ビット[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a checkpointing method and system for a multi-processor data processing system to provide error recovery.
[0002]
[Prior art]
In order to allow high instruction level parallelism on modern processors, multiple instructions can be executed in parallel and eventually retired. This is essential when the complex instructions of the CISC processor are converted into multiple simpler RISC-like instructions and when the number of instructions executed per cycle (IPC) must be increased. Retirement of these instructions means that the contents of the designed register array are updated with the results of the internal instructions and the corresponding store data is written back to the cache / memory. To reflect the sequence of instructions provided by the program, retirement or completion of instructions occurs in a conceptual order. Thus, the terms “young” and “old” instructions represent instructions that are found later or early in the instruction sequence, respectively. Checkpointing means that a snapshot of the state of the corresponding data stored in the designed registers and data cache is taken at a certain frequency, ie, a fixed time interval. The highest resolution is obtained when the snapshot is taken every cycle. Such a prior art checkpointing method is disclosed in US Pat. No. 5,418,916. The checkpoint retry function uses the store buffer to establish the store queue during normal operation and supply the data necessary for checkpoint retry during the retry operation. The data buffered therein also includes floating point register, general purpose register, and access register register data and a program status word.
[0003]
This is basically done with the help of a plurality of store buffers associated with each L1 cache of the processing unit. Each store buffer is used as an intermediate buffer to hold the storage data until it can be released to other parts of the storage hierarchy where other CPUs can access the data. The
[0004]
  In order to control the release of storage data, two information bits, an “end of instruction” (EOI) bit and a “checkpoint complete” (COMP) bit, are stored in the store queue. Equipped with design. Data in the store buffer is only available to the processor directly associated with it. For other processors, this data isSharedThis data cannot be accessed until it is written to the L2 cache or memory. However, this prior art technique uses a plurality of external instructions (per cycle)Less than,"CISC instructionOr “external CISC instruction”) Have weaknesses when it is necessary to checkpoint. That is, at most one per cycleOutsideOnly instructions can be checkpointed.
[0005]
[Problems to be solved by the invention]
Accordingly, it is an object of the present invention to provide an improved method and system for checkpointing a superscalar system that can checkpoint multiple external instructions per cycle.
[0006]
[Means for Solving the Problems]
  The (maximum) number of processors per cycleCommand (hereinafter referred to as “RISC command” or “RISC-like instructionIs also called)If the number of internal instructions representing external CISC instructions is not fixed and depends, for example, on the operation code, processor state checkpointing may be based on multiple external instructions.
[0007]
  simpleAn example is shown in FIG.External command ID 40 (hereinafter referred to as “command identifier” or simply “IDIs also uniquely identified)ProcessorCThe ISC instructions 42 are divided into one to four internal instructions 44 that operate on the designed registers, and one to four load / store instructions 46 that handle fetching and storing data from the cache / memory. Assume that it can be converted to
[0008]
The assumption that up to four internal instructions can be retired simultaneously means that up to four external CISC instructions must be checkpointed when taking a snapshot of the processor state every cycle.
[0009]
Checkpointing of the processor's designed registers can be done by copying the register contents to a checkpointing array in which all registers have a master copy. Store data checkpointing may be first written back to the lower (eg L1) cache, but is released in the higher cache memory (eg L2) upon completion of the checkpoint May be based on memory hierarchy. In order to ensure that register-based internal instructions and corresponding store instructions can be related to external instructions, these instructions must be tagged with a unique instruction identifier number (ID).
[0010]
The above-mentioned objects of the invention are achieved by means of the features described in the attached independent claims. Further advantageous arrangements and embodiments of the invention are described in the respective dependent claims. Do not refer to the appended claims.
[0011]
  According to the broadest aspect, a method for checkpointing a single processor or multiple processor data processing system to achieve error recoveryIs provided. This method includes
  a)In checkpoint status bufferIn addition,Updating a predetermined maximum number of register contents, performed by each of multiple CISC / RISC instructions(For example, up to 4 register updates)Where the checkpoint state is
  As many buffering slots as registers that can be updated by the plurality of instructions (derived from CISC);
  The plurality of CIsSC lifeDecreehomeYoungestCISCProgram counter value items related to instructions and
  Including steps, and
  b) Currently collectedWithin the Jister dataAn errorAfter determining that it was not detected,PreviousYoungestCISCBefore or with completion of the orderSaidUpdating the designed register array (ARA) with the register data;
  Is included.
[0012]
Thus, this advantage provides that an instruction sequence including multiple external instructions, each operating on a register, can be checkpointed every cycle.
[0013]
Thus, the present invention is based on the idea that register content updates performed by one or more external CISC instructions per cycle are collected by forming the checkpoint state. The checkpoint state preferably consists of as many slots as registers that can be updated by one or more CISC instructions per cycle. In addition, all instructions update the status of the processor, for example the program counter. For checkpointing multiple instructions, only the last situation is important; for example, if multiple external CISC instructions are completed simultaneously, the last completed completion in the sequence of external instructions causes the program A counter is determined.
[0014]
If no error is detected during program execution, this checkpoint state is (finally) used to perform a checkpointing array update. This update may occur several cycles after the checkpoint state is created. Multiple checkpoint states can be collected in a checkpoint state buffer (CSB) until the checkpoint state is finally used to update the checkpointing array, eg, the ARA above, A new checkpoint state is collected for every cycle. If an error in the processor is detected, the checkpointing array update mechanism is immediately blocked, so that corrupted data does not contaminate the checkpointing array.
[0015]
Further, when the inventive method includes providing an error detection and correction (ECC) bit in the ARA entry, an efficient and area saving error correction mechanism is provided for bit failures.
[0016]
  In addition to this inventive method,
  a) providing, in parallel with the ARA update, a second control path for controlling the release of STORE data resulting from a plurality of STORE instructions from the store buffer (STB) to the designed state cache memory;
  b) PreviousRecordToo youngCISCorderISynchronizing the STORE data release with the ARA update by tagging the checkpoint status buffer entry with D;
  c)SaidYoungestOf CISC instructionOlder than IDOr equal toReleasing only data having IDs into the designed state cache memory;
  Is included. In this caseThis provides the advantage that one or more STORE instructions can also be included in the above sequence. Therefore, a mixed sequence of register manipulation instructions and cache manipulation instructions can be checkpointed every cycle. Thus, the inventive concept is not limited to focus only on register update instructions.
[0017]
The basic idea for synchronization between checkpoint state updates and storing data in cache / memory is that all checkpoint states are tagged with the ID of the last external instruction in the sequence to be checkpointed. It is attached. Store data is tagged with the ID of the corresponding instruction ID. All STORE data is held in the store buffer until it is released to cache / memory. Synchronization is obtained only for STORE data that has an ID that is older or equal to the ID of the last checkpoint state used to update the checkpointing array, ie, system memory (ie, L2 cache). It is a case where it is released. Store data corresponding to instructions that have not yet been checkpointed into the checkpointing array is held in the store buffer until the corresponding checkpoint state is used to update the checkpointing array. Thus, the contents of the checkpointing array and the data stored in the system memory are guaranteed to be consistent at all times. If an error occurs inside the processor, the corrupted data is not in system memory. In recovery, when a designed register is restored by using a checkpointing array and restoring the processor state, for example, the program counter, the processor does not damage the data in system memory. The program execution can be restarted.
[0018]
  Furthermore, in the above synchronization step, double handshake operation between ARA update control and STORE data release controlIs included. This double handshake action is
  a) at leastSaidYoungestCISCThe youngest when each STORE data associated with an instruction resides in the store queueCISCA first step of signaling an instruction ID to the ARA update control, whereby the youngest signaledCISCA first step in which an ARA update including a register instruction having an older ID compared to the instruction ID is triggered;
  b) related to the latest ARA updateSaidYoungestOf CISC instructionSignaling ID to STORE data release controldo it, A second step of triggering STORE data release from the store buffer to the designed state cache memory, wherein the release is the youngest signaledCISC instructionA second step comprising STORE data resulting from an instruction having an older ID compared to the ID of
  Is included. In this case,A preferred method of synchronization is provided. This provides an efficient and consistent checkpointing system. Further details are given below with respect to FIG.
[0019]
  This inventive conceptIn the checkpoint status buffer,Each complexCISCBy order(TaToe1As a LOAD MULTIPLE instruction that updates up to six registers)Update a given extended maximum number of register contents to be performed(For example,Up to 16 register updates)Can be advantageously extended to collectpreferable.This extension
  a)SaidMultiple extended checkpoint status buffer items, each to receive register update data(For example,16/4 = 4 items)A step of booking
  b) Using glue bits, one identicalCISCMarking subsequent items associated with the instruction;
  c) updating the extended checkpoint state in a multi-cycle atomic operation;
  By providingDone.
[0020]
With this feature, very complex external CISC instructions that need to be converted to more than four internal instructions are in atomic form, ie, successfully completed or never started, Assuming there is no operational failure such as a power failure during a point cycle, checkpointing can be done within multiple cycles.
[0021]
Providing a processor unit with logic circuit means for performing one or more steps of the above-described variants, means for collecting updates of a predetermined maximum number of register contents, each comprising an instruction ID, a target Preferably, the checkpoint state buffer has a plurality of entries that preferably include a register address, target register data, and a program counter. A logical checkpoint state includes a plurality, preferably four, of such items (see FIGS. 2-4).
[0022]
In this way, each internal instruction can basically update one register (which often occurs) and the buffer is large enough to accept up to four internal instructions. This CBS buffer organization has been found to be a good compromise between area consumption and performance gains implied by improved runtime stability.
[0023]
Therefore, those skilled in the art will propose a new scheme for the present invention to checkpoint multiple external CISC instructions to ensure consistency between the designed register contents and the data stored in the cache / system main memory. Will understand what to do.
[0024]
The invention is illustrated by way of example, but the invention is not limited by the shape of the figures of the accompanying drawings.
[0025]
DETAILED DESCRIPTION OF THE INVENTION
With reference generally to the drawings and with particular reference now to FIG. 2, an exemplary processor architecture has been selected to illustrate the inventive concept. This is a processor architecture where register data is manipulated separately from the data stored or loaded into the cache. Register data “streams” are manipulated in a processor portion called Instruction Window Buffer (IWB), and store or load data “streams” are separated into different portions: Storage Window Buffers (Storage Window Buffers, SWB). When considering checkpointing, both data streams must be made consistent. However, this inventive concept is based on a different scheme that probably follows other criteria for data separation, as long as the checkpointing control is required to maintain register data that is consistent with other system memory data. It should be understood to encompass all other types of processor architectures implemented.
[0026]
Next, a checkpoint state definition advantageous to such an IWB / SWB architecture will be described in detail.
[0027]
The checkpoint state includes as many slots as registers that can be updated every cycle. Thus, in one example, one CISC instruction can update multiple of the four registers. Thus, the checkpoint state includes four slots: slot 0, slot 1, slot 2, and slot 3, and an additional slot for storing program counters. Each of the four target slots includes a register address 10 and a respective register data 12.
[0028]
Using this exemplary checkpoint state definition, four internal instructions representing up to four external CISC instructions can be retired every cycle. This means that up to 4 registers are updated every cycle. In addition, every instruction updates the state of the processor, for example the program counter 14 stored in the slot.
[0029]
For checkpointing of multiple instructions according to the present invention, only the final situation is important, i.e., when multiple external CISC instructions are completed simultaneously, the program counter Determined by the last completed instruction.
[0030]
Next, two examples are shown and described with respect to FIGS. 3 and 4, respectively.
[0031]
A CISC instruction (see the first column in the upper part of FIG. 3) is converted into four RISC-like instructions corresponding to the four rows in the upper table, but in this CISC instruction, the first and third Write to two target registers with RISC instruction. Therefore, this occupies two slots in the checkpoint state, and in the lower part, the two leftmost slots are configured as shown in FIG. The program counter 14 determines the instruction address of the next sequential instruction. Two slots in checkpoint state (ie, the leftmost two slots) are used and marked by a valid bit.
[0032]
In another example shown in FIG. 4, it is assumed that three CISC instructions CISC # 0,..., CISC # 2 are converted into four RISC-like instructions. Specifically, the first CISC # 0 instruction is converted into two instructions corresponding to the upper two rows at the top of FIG. 4, and the subsequent CISC # 1 and CISC # 2 instructions are respectively represented as Are represented by a single RISC-like instruction that occupies only one row. In this sequence, 4 registers are updated, so the corresponding checkpoint state uses 4 or all available slots. The status information of this checkpoint is derived from the last instruction of this checkpointed sequence of CISC instructions, ie the program counter is extracted from the third instruction, CISC # 2.
[0033]
Referring now to FIGS. 5 and 6, an additional synchronization mechanism (double handshake) feature is disclosed, which is described above such as register data such as instruction operands and variables stored in memory, for example. STORE data can be advantageously applied to a microprocessor architecture where one and the same CISC instruction is used.
[0034]
Basically, all checkpoint states (see FIG. 2 for basics) are advantageously tagged with the instruction ID 40 of the last external instruction in the checkpointed sequence. This is shown in FIG. 5 and FIG.
[0035]
In FIG. 6, all the STORE data is included in the SWB portion on the right side. In this exemplary STORE data processing architecture, data such as data associated with program variables comes from a store queue (STQ) 62. STORE data is sent to the store-through (or write-through) L1 cache 61a and ECC generation 61b, which, for example, covers 4 words and advantageously accommodates a complete L1 cache line containing new STORE data. STORE data is then ECC processed after instruction execution. Each ECC generation 61b is then buffered without error in the store buffer 65, and the store buffer 65 can preferably be located in the L1 cache portion as well, but in the alternative it is located near the L1 cache portion. The
[0036]
The data is then released to the L2 cache 66 or other adapted portion of the memory hierarchy in a synchronized manner with the release of register data. In the STORE data processing, data that may be finally corrected during ECC processing is off-chip, for example, on-chip at a higher speed than ECC processing of data arranged in the L2 cache. This is preferred because it is completely processed. However, the left IWB portion includes a checkpoint state buffer (CSB) 60 and essentially a checkpointing array 64, which contains designed register data (error detected and error corrected). error-checked and error-corrected architected Register). Therefore, this is abbreviated as (ECC-ARA).
[0037]
In general, checkpointing to the checkpointing array 64 (ECC-ARA) is performed by an atomic procedure. "Atomic" means that the checkpoint procedure must be expected to be complete, i.e. completed successfully, to ensure a reliable reset point of one or more processors, It means that even starting is not allowed.
[0038]
In general, unless the checkpoint is completed, neither read access to the ECC-ARA of the checkpointing array 64 nor the reset function associated with the ECC-ARA is possible. This ensures that only completed external instructions are checkpointed into the ECC-ARA.
[0039]
Specifically, the preferred synchronization (double handshake) scheme between the update of the checkpointing array 64 and the corresponding “designed” memory portion, update of the L2 cache 66 is as follows. In order to synchronize the register contents stored in the checkpointing array 64 with the data stored in the L2 cache 66, a first synchronization is established between the STQ 62 and the CSB 60 to control the checkpointing array 64 ( Another second synchronization is established between the logic) and the store buffer 65.
[0040]
Assuming an empty STQ 62, checkpointing occurs whenever a complete checkpoint condition is available in the CSB 60. Therefore, all data must be present, specifically register address 10 and register data 12 described above with respect to FIGS.
[0041]
If STORE data is found in STQ 62, the instruction identifier ID is signaled to CSB 60 (see arrow 68 of the first handshake signal) and its checkpoint status is indicated by checkpointing array 64 (ECC-ARA). You will be able to checkpoint. Thus, this is done via control logic including a read port 63 arranged to access the CSB 60 for reads and to access the checkpointing array 64 for read / write access when it is signaled accordingly. Are preferred.
[0042]
When the checkpoint state is fully checkpointed to the checkpointing array 64, the corresponding ID is read from the CSB 60, or preferably from the checkpointing array 64 via the read port 63, and is ECC protected. Sent to store buffer 65 containing SOTRE data (see arrow 69 in second handshake signal). STORE data stored in the respective items of the store buffer 65 is released to the L2 cache only when the respective IDs are received from the checkpointing array 64. This is therefore done after receiving the ID. As can be appreciated by those skilled in the art (when following these rules), the memory contents of the L2 cache 66 are always consistent with the register data found in the checkpointing array 64. Data related to the same instruction has the same ID in each of the storage means, that is, CSB 60, checkpointing array 64, STQ 62, store buffer 65, and L2 cache 66.
[0043]
In other words, synchronization basically means that certain STORE data with an ID that is older or equal to the ID of the last checkpoint state used to update checkpointing array 64 is stored in system memory. Obtained only when released. Store data corresponding to instructions that have not yet been checkpointed into the checkpointing array is retained until the corresponding checkpoint state is used to update the checkpointing array 64. Thus, it is guaranteed that the contents of the checkpointing array and the data stored in the L2 cache or memory are consistent at all times.
[0044]
If an error occurs inside the processor, the corrupted data is not in the L2 cache or memory. In recovery, if the designed registers are restored by using checkpointing array 64 and restoring the processor state, eg, the program counter, the processor can program without damaging the data in system memory. Execution can be restarted and the contents of the L1 cache must be purged. Note that all items older than the ID of the last checkpointed instruction must also be erased in the store buffer 65.
[0045]
When CSB 60 is full, the instruction committer is stopped, so it must be added that no new instructions are committed. This leads to a halt of the processor until the STQ 62 stores data in the store buffer 65.
[0046]
Further, the update of checkpointing array 64 may occur several cycles after the checkpoint state is created. During this time, multiple checkpoint states (preferably one per cycle) are stored in the checkpoint until the checkpoint state is finally used for consistent update of the checkpointing array 64. It can be collected in a status buffer (CSB). Upon recognition of an error by any other prior art error detection logic that is not the subject of the present invention, checkpointing of the next checkpoint state stored in the CSB 60 is immediately blocked. Thus, it is ensured that the error does not affect the checkpointing array 64 and does not destroy the “correct” data stored in the L2 cache.
[0047]
Further, referring to FIG. 7, further disclosed is a checkpoint state extension for very complex external instructions consisting of more than four internal instructions.
[0048]
If it is necessary to convert a very complex external (CISC) instruction into more than four internal instructions with the benefit of the present disclosure, the basic technical inventive feature is not abandoned, The inventive scheme can be extended without being limited to providing a CSB 60 that is much wider and therefore more area consuming to receive an increased number of internal instructions.
[0049]
Under this premise, the retirement of complex instructions continues over multiple cycles if only up to four internal instructions can be retired per cycle. This means that the checkpoint state created in every cycle does not represent one complete external instruction and therefore should not be used to update the checkpointing array.
[0050]
The solution to this problem is based on the idea that multiple checkpoint states are considered to form a single “extended checkpoint state” that represents the entire complex external instruction.
[0051]
According to a preferred embodiment, these incomplete checkpoint states occupy multiple items in CSB 60. The plurality of items preferably form a sequence. A glue / link bit 72 provided at a position within each entry of the store buffer 65 marks that successive checkpoint states stored in the store buffer 65 form an extended checkpoint state (FIG. 7). See). Thus, it is necessary that the store buffer 65 possess at least as many items as are necessary to represent the extended checkpoint state of all possible complete CISC instructions.
[0052]
Furthermore, the update of the checkpointing array 64 (see FIG. 6) must not be interrupted until the atomic operation is completely updated.
[0053]
When an error is detected, the update mechanism must not be blocked until the atomic operation is completely checkpointed. This also means that the checkpointing into the extended checkpoint state checkpointing array 64 may continue for multiple cycles.
[0054]
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. However, it will be apparent that various modifications and changes may be made without departing from the broader spirit and scope of the invention as set forth in the claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
[0055]
Those skilled in the art will recognize that the present invention updates the contents of checkpointing buffer means, such as checkpointing array 64, for example for multiple CISC instructions that are retired at the same time, to provide the most accurate representation of processor register contents and data in system memory. It would be appreciated to propose a new and advantageous scheme to synchronize the release of STORE data to a cache hierarchy such as the L2 cache 66 and the update of the checkpointing buffer means to achieve a safe checkpoint. . The exact method of checkpointing allows the processor to be recovered upon detection of a soft or hard error in the processor without destroying data stored in memory. This is considered important to satisfy the increased level of computing requirements.
[Brief description of the drawings]
FIG. 1 is a schematic diagram illustrating the concept of conversion from a CISC instruction to multiple RISC instructions, introducing an inventive instruction ID.
FIG. 2 is a schematic representation of a checkpoint state definition according to the present invention.
FIG. 3 shows a schematic representation of a checkpoint state that includes a CISC instruction that is converted to two internal instructions with two register updates.
FIG. 4 shows the representation according to FIG. 2 in which three CISC instructions are converted into four internal instructions and a 2 + 1 + 1 register update.
FIG. 5 shows a schematic representation of a checkpoint state tagged with the ID of the last checkpointed CISC instruction.
FIG. 6 is a schematic representation showing the synchronization of register checkpointing and STORE data processing according to the present invention.
FIG. 7 is a schematic representation showing the principle of an extended checkpoint state according to the present invention.
[Explanation of symbols]
10 Register address
12 Register data
14 Program counter
40 Instruction ID
42 CISC instruction
44 Internal instructions
46 Load / store instructions
60 Checkpoint status buffer (CSB)
61a L1 cache
61b ECC generation
62 Store Queue (STQ)
63 Reading port
64 Checkpointing array
65 Store buffer
66 L2 cache
72 glue / link bit

Claims

エラー・リカバリを提供するための、複数プロセッサ・データ処理システムをチェックポインティングする方法であって、
ａ）チェックポイント状態バッファ（６０）内に、それぞれの複数のＣＩＳＣ命令（４２）またはＲＩＳＣ命令（４４、４６）によって実行される、レジスタ内容（１２）の所定の最大個数の更新を収集するステップであって、チェックポイント状態が、前記複数のＣＩＳＣ命令（４２）によって更新され得るレジスタと同数のバッファリング・スロットと、前記複数のＣＩＳＣ命令（４２）のうち最も若いＣＩＳＣ命令に関連するプログラム・カウンタ値（１４）の項目とを含むステップと、
ｂ）現在収集されているレジスタ・データ（１０、１２）内でエラーが検出されなかったことを判定した後で、前記最も若いＣＩＳＣ命令の完了の前または完了と共に、前記レジスタ・データ（１０、１２）を用いて設計済みレジスタ・アレイ（ＡＲＡ）（６４）を更新するステップと
を含む、方法。To provide error recovery, a method for checkpointing multiple processor data processing system,
a) the checkpoint state buffer (60), to collect each of the plurality of CISC instructions (42) or Therefore is performed RISC instructions (44, 46), the updating of the predetermined maximum number of register contents (12) a step, a checkpoint state, associated with the youngest CISC instruction of the plurality of CISC instructions (42) the same number of buffering slots and registers which can be updated by the plurality of CI SC instruction (42) Including a program counter value (14) item to be
b) After determining that an error is not detected within the register data currently collected (10, 12), with prior or completion of prior SL youngest completion of CISC instructions, the register data (10 , and updating the architected register array (ARA) (64) with a 12), the method.

前記ＡＲＡ（６４）項目にエラー検出および訂正ビットを設けるステップ
をさらに含む、請求項１に記載の方法。The method of claim 1, further comprising: providing error detection and correction bits to the ARA (64) entry.

複数のＳＴＯＲＥ命令からの結果のＳＴＯＲＥデータの、ストア・バッファ（６５）から設計済み状態キャッシュ・メモリ（６６）への解放を制御する第２制御パスを、前記ＡＲＡ（６４）更新と並列に提供するステップと、
前記最も若いＣＩＳＣ命令の命令ＩＤ（４０）を用いて前記チェックポイント状態バッファ（６０）項目にタグを付けることによって、前記ＳＴＯＲＥデータ解放を前記ＡＲＡ更新と同期化するステップと、
前記最も若いＣＩＳＣ命令の命令ＩＤ（４０）より古いかそれと等しい命令ＩＤ（４０）を有するデータだけを設計済み状態キャッシュ・メモリ（６６）に解放するステップと
をさらに含む、請求項１に記載の方法。Provides a second control path in parallel with the ARA (64) update for controlling the release of STORE data resulting from multiple STORE instructions from the store buffer (65) to the designed state cache memory (66) And steps to
By tagging the checkpoint state buffer (60) items with a pre Symbol youngest instruction ID of CISC instructions (40), the steps of the ARA updates and synchronizing the STORE data release,
Further comprising the step of releasing the youngest CISC instruction in the instruction ID (40) architected state cache memory (66) only data having older or greater equal instruction ID (40), according to claim 1 Method.

前記同期化ステップが、ＡＲＡ更新制御（６３）とＳＴＯＲＥデータ解放制御との間のダブル・ハンドシェーク動作を含み、当該ダブル・ハンドシェーク動作が、
ａ）少なくとも前記最も若いＣＩＳＣ命令に関連するそれぞれのＳＴＯＲＥデータが前記ストア・バッファ（６５）に常駐する時に、前記最も若いＣＩＳＣ命令の命令ＩＤ（４０）を前記ＡＲＡ更新制御（６３）にシグナリング（６８）する第１ステップであって、これによって、前記シグナリングされる最も若いＣＩＳＣ命令の命令ＩＤ（４０）と比較してより古い命令ＩＤ（４０）を有するレジスタ命令を含むＡＲＡ更新がトリガされる、第１ステップと、
ｂ）最も新しいＡＲＡ（６４）更新に関連する前記最も若いＣＩＳＣ命令の命令ＩＤ（４０）をＳＴＯＲＥデータ解放制御にシグナリング（６９）して、前記ストア・バッファ（６５）から前記設計済み状態キャッシュ・メモリ（６６）へのＳＴＯＲＥデータ解放をトリガする第２ステップであって、前記解放が、前記シグナリングされる最も若いＣＩＳＣ命令の命令ＩＤ（４０）と比較してより古い命令ＩＤ（４０）を有する命令から生じるＳＴＯＲＥデータを含む、第２ステップと
を含む、請求項３に記載の方法。Wherein the synchronization step comprises a double handshake operation between ARA update control (63) and STORE data release control, the double handshake operation,
a) Signaling the instruction ID (40) of the youngest CISC instruction to the ARA update control (63) when each STORE data associated with at least the youngest CISC instruction resides in the store buffer (65) ( 68) which triggers an ARA update that includes a register instruction having an older instruction ID (40) compared to the instruction ID (40) of the youngest CISC instruction being signaled. The first step;
b) newest ARA (64) the youngest CISC instructions in the instruction ID associated with the update (40) and signals (69) to the STORE data release control, the architected state cache from said store buffer (65) A second step of triggering the release of STORE data to memory (66), said release having an older instruction ID (40) compared to the instruction ID (40) of the youngest CISC instruction signaled including STORE data resulting from the instruction, the second step and the including method of claim 3.

チェックポイント状態バッファ（６０）内に、それぞれの複雑なＣＩＳＣ命令によって実行される、チェックポイント状態バッファ（６０）内の所定の拡張された最大個数のレジスタ内容の更新を収集するように拡張され、
ａ）前記レジスタ更新データを受け取るために、それぞれの拡張された複数のチェックポイント状態バッファ（６０）項目を予約するステップと、
ｂ）グルー・ビット（７２）を用いて、１つの同一のＣＩＳＣ命令に関連する後続項目をマークするステップと、
ｃ）複数のサイクルのアトミック・オペレーションで、このように拡張されたチェックポイント状態を更新するステップと
を含む、請求項１に記載の方法。 Checkpoint state buffer (60), executed by a respective complex CISC instruction, is extended to collect the update of the register contents of a given extended highest number in the checkpoint state buffer (60),
a) reserving each extended plurality of checkpoint status buffer (60) entries to receive the register update data;
b) using the glue bit (72) to mark subsequent items associated with one identical CISC instruction;
and c) updating the extended checkpoint state in a multi-cycle atomic operation.

請求項１ないし５のいずれか一項に記載の方法のステップを実行する論理回路手段を有するプロセッサ・ユニットであって、
所定の最大個数のレジスタ内容の更新を収集する手段が、それぞれが命令ＩＤ、ターゲット・レジスタ・アドレス、ターゲット・レジスタ・データ、およびプログラム・カウンタを含む複数のバッファ項目を含むチェックポイント状態バッファ（６０）（ＣＳＢ）であり、
これによって、チェックポイント状態に、複数の項目が含まれる
プロセッサ・ユニット。A processor unit comprising logic circuit means for performing the steps of the method according to claim 1,
A means for collecting a predetermined maximum number of register content updates is a checkpoint status buffer (60) that includes a plurality of buffer items each including an instruction ID, a target register address, target register data, and a program counter. ) (CSB)
Thus, the checkpoint state, the processor unit includes a plurality of items.

請求項６に記載のプロセッサ・ユニットを有するコンピュータ・システム。 A computer system comprising the processor unit according to claim 6.