JP2009151440A

JP2009151440A - Program hang-up detection method and computer device adopting the same method

Info

Publication number: JP2009151440A
Application number: JP2007327226A
Authority: JP
Inventors: Hiroki Konno; 廣毅今野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-12-19
Filing date: 2007-12-19
Publication date: 2009-07-09

Abstract

PROBLEM TO BE SOLVED: To achieve highly reliable program hang-up detection by performing WDT clear without fail as long as a driver under the control of an MOS and an OS normally operates, in a program hang-up detection method. SOLUTION: A computer device includes: an interrupt receiving means for receiving the system clock interrupt of a processor; a driver keep-alive confirmation means for generating a WDT keep-alive confirmation request on the basis of the received system clock interrupt; and a WDT clear requesting means for receiving the WDT keep-alive confirmation request, and for making a request to clear a watchdog timer in the processor. The clear of the watchdog timer is stopped according to the stop of the operation of the driver keep-alive confirmation means or the WDT clear request means due to the hang-up of the processing program of the operating system or a driver program under the control of the operating system, and the processor is reset according to the threshold excess of the watchdog timer. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、プログラムのハングを検出するプログラムハング検出方法及びそれを適用したコンピュータ装置に関する。 The present invention relates to a program hang detection method for detecting a program hang and a computer apparatus to which the program hang detection method is applied.

図１は、従来のプログラムハング検出方法を用いたコンピュータ装置の一例の構成図を示す。アプリケーションプログラム（以下、「アプリ」と略す）１Ａ，１Ｂ，１Ｃは、システム監視アプリ２によって監視され、システム監視アプリ２は各アプリ１Ａ，１Ｂ，１Ｃの生存通知としてのＷＤＴ（ＷａｔｃｈＤｏｃＴｉｍｅｒ）クリア要求をＷＤＴドライバ３に供給する。 FIG. 1 is a block diagram showing an example of a computer apparatus using a conventional program hang detection method. Application programs (hereinafter abbreviated as “applications”) 1A, 1B, 1C are monitored by the system monitoring application 2, and the system monitoring application 2 clears WDT (Watch Doc Timer) as a survival notification of each application 1A, 1B, 1C. The request is supplied to the WDT driver 3.

また、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）５内のＯＳ処理プログラム（以下、「ＯＳ処理」という）５Ａ，５Ｂ，５Ｃそれぞれも生存通知としてのＷＤＴクリア要求をＷＤＴドライバ３に供給し、ＯＳ配下のドライバプログラム（以下、「ドライバ」という）６Ａ，６Ｂ，６Ｃそれぞれも生存通知としてのＷＤＴクリア要求をＷＤＴドライバ３に供給する。 In addition, each of OS processing programs (hereinafter referred to as “OS processing”) 5A, 5B, and 5C in the OS (Operating System) 5 also supplies a WDT clear request as a life notification to the WDT driver 3, and a driver program under the OS ( Each of 6A, 6B, and 6C (hereinafter referred to as “driver”) also supplies a WDT clear request as a survival notification to the WDT driver 3.

ＷＤＴドライバ３はＷＤＴクリア要求を供給されると、プロセッサ７内のＷＤＴ８に対しＷＤＴクリアを行う。ＷＤＴ８は、所定時間ＷＤＴクリアが行われない場合にはカウンタ満了となってＷＤＴリセット信号を発生し、ＣＰＵコア９をリセットする。 When the WDT driver 3 is supplied with the WDT clear request, the WDT driver 3 performs WDT clear on the WDT 8 in the processor 7. If the WDT 8 is not cleared for a predetermined time, the WDT 8 expires the counter, generates a WDT reset signal, and resets the CPU core 9.

なお、特許文献１には、監視プログラムの異常を検出するソフトウェアＷＤＴと、ソフトウェアＷＤＴの異常を検出するハードウェアＷＤＴを有することが記載されている。 Patent Document 1 describes that software WDT for detecting an abnormality of a monitoring program and hardware WDT for detecting an abnormality of software WDT are described.

特許文献２には、低レベルタスクのフラグ操作を高レベルタスクが監視し、中レベルタスクの異常を検出することが記載されている。 Patent Document 2 describes that a high-level task monitors a flag operation of a low-level task and detects an abnormality in a middle-level task.

特許文献３には、一定時間ＷＤＴにアクセスがない場合に割り込みを発生し、更に一定時間ＷＤＴにアクセスがない場合にはリセット信号を発生することが記載されている。 Patent Document 3 describes that an interrupt is generated when there is no access during a certain period of time WDT, and a reset signal is generated when there is no access during a certain period of time WDT.

特許文献４には、異常個所を特定するためにタスク番号と実行ステップ番号をエラーログとして格納することが記載されている。 Patent Document 4 describes storing a task number and an execution step number as an error log in order to identify an abnormal part.

特許文献５には、監視ソフトウェアがアプリの状態を監視し、異常時にアプリの再起動を行うことが記載されている。 Patent Document 5 describes that monitoring software monitors the state of an application and restarts the application when an abnormality occurs.

特許文献６には、周期的に起動される監視タスクが不定期に動作するタスクを監視することが記載されている。
特許２８７０２５０号公報特開昭６３−２８０３４５号公報特開平１−１５４２５８号公報特開２００４−３５５３８４号公報特開２００２−１４９４３７号公報特開２００６−２２７９６２号公報 Patent Document 6 describes that a periodically started monitoring task monitors a task that operates irregularly.
Japanese Patent No. 2870250 JP-A 63-280345 JP-A-1-154258 JP 2004-355384 A JP 2002-149437 A JP 2006-227862 A

図１に示す従来のプログラムハング検出方法では、それぞれのＯＳ処理５Ａ，５Ｂ，５Ｃやドライバ６Ａ，６Ｂ，６Ｃが必要と考えるときにＷＤＴドライバ３に対しＷＤＴクリア要求を出している。 In the conventional program hang detection method shown in FIG. 1, a WDT clear request is issued to the WDT driver 3 when each OS processing 5A, 5B, 5C and drivers 6A, 6B, 6C are considered necessary.

ここで、図２（Ａ）に示すように、ドライバ６Ａにおけるループ処理内でデバイス操作６Ａ−１に続けて生存通知（ＷＤＴクリア要求）６Ａ−２を行う設定となっている場合に正常に動作していたとする。ドライバの処理変更によって、図２（Ｂ）に示すように、ループ処理内にデバイス操作６Ａ−３が追加されると、生存通知（ＷＤＴクリア要求）６Ａ−２のタイミングが想定した場合より遅くなり、突然ＷＤＴリセットが発動してしまい、システムの信頼性を低下させるという問題があった。 Here, as shown in FIG. 2 (A), when the setting is made to perform the survival notification (WDT clear request) 6A-2 following the device operation 6A-1 in the loop processing in the driver 6A, the device operates normally. Suppose you were. When the device operation 6A-3 is added in the loop processing due to the driver processing change, the timing of the survival notification (WDT clear request) 6A-2 becomes slower than assumed, as shown in FIG. However, there was a problem that the WDT reset was suddenly activated and the reliability of the system was lowered.

また、図３（Ａ）に示すように、ドライバ６Ｂにおけるデバイス操作６Ｂ−１のループ回数が少なく生存通知が不要で正常に動作していたとする。ドライバの処理変更によって、図３（Ｂ）に示すように、ループ回数が増加した場合はループ処理に係る時間が増大して生存通知のタイミングが想定した場合より遅くなり、ループ処理内に生存通知ステップが設けられていないために、突然ＷＤＴリセットが発動してしまい、システムの信頼性を低下させるという問題があった。 Further, as shown in FIG. 3A, it is assumed that the number of loops of the device operation 6B-1 in the driver 6B is small, and that the life notification is unnecessary and the driver 6B is operating normally. As shown in FIG. 3B, when the number of loops increases due to driver processing changes, the time for loop processing increases and the timing of survival notification is slower than expected, and the survival notification is included in the loop processing. Since no step is provided, a WDT reset is suddenly activated, and there is a problem that the reliability of the system is lowered.

また、従来方法では、ＯＳ４やＯＳ配下のドライバ５Ａ〜５Ｃがハングした場合と、ＯＳ上で動作するアプリ１Ａ〜１Ｃがハングした場合とでは何ら区別がなく、一律にシステムリセット（ＣＰＵコア９のリセット）を行っている。 In the conventional method, there is no distinction between the case where the OS 4 or the drivers 5A to 5C under the OS hang and the case where the applications 1A to 1C operating on the OS hang, and the system reset (CPU core 9 Reset).

このように、ＷＤＴ８自身が動作できなかったカウンタ満了によってシステム例外を発生させるため、プログラムハングの被疑要因となったタスクの特定が困難であり、システムのスタックデータ等から動作していたタスクを探し、プログラムトレースで被疑要因となったタスクを探していたことから、システムの平均修理時間（ＭＴＴＲ）が長くなり、システムの保守性と信頼性と可用性を低下させる要因となっていた。 In this way, because the system exception is generated due to the expiration of the counter that WDT 8 itself could not operate, it is difficult to identify the task that was the suspected cause of the program hang, and the task that was operating from the system stack data etc. is searched for Since searching for a task that became a suspicious factor in the program trace, the average repair time (MTTR) of the system was increased, which caused a decrease in system maintainability, reliability, and availability.

図４に、プログラムハング発生時の処理の流れを示す。先ず、ＣＰＵコア９のスタックデータを退避し、ＣＰＵコア９のリセットを行う。こののち、退避したスタックデータを収集して原因解析を行う。これにより原因が判明すると、その部分を修正する。この場合、ＣＰＵコア９のリセットを行った後は、見た目にはＣＰＵコア９が回復し稼働しているように見えるものの、実際の稼働は原因が判明して修正を行った後である。 FIG. 4 shows the flow of processing when a program hang occurs. First, the stack data of the CPU core 9 is saved and the CPU core 9 is reset. After that, the saved stack data is collected and the cause is analyzed. If the cause is found by this, the part is corrected. In this case, after the CPU core 9 is reset, the CPU core 9 seems to be recovered and operating, but the actual operation is after the cause has been found and corrected.

本発明は、上記の点に鑑みなされたものであり、ＯＳとＯＳ配下のドライバが正常に動作している限り必ずＷＤＴクリアを行って信頼性の高いプログラムハング検出ができるプログラムハング検出方法を提供することを目的とする。 The present invention has been made in view of the above points, and provides a program hang detection method capable of reliably performing a program hang detection by surely performing WDT clear as long as the OS and the driver under the OS are operating normally. The purpose is to do.

本発明の一実施態様では、プロセッサのシステムクロック割り込みを受信する割り込み受信手段と、
受信したシステムクロック割り込みを基にＷＤＴ生存確認要求を生成するドライバ生存確認手段と、
前記ＷＤＴ生存確認要求を受信して前記プロセッサ内のウォッチドッグタイマのクリア要求を行うＷＤＴクリア要求手段とを有し、
オペレーティングシステムの処理プログラム又はオペレーティングシステム配下のドライバプログラムのハングによる前記ドライバ生存確認手段又は前記ＷＤＴクリア要求手段の動作停止で前記ウォッチドッグタイマのクリアが停止し、前記ウォッチドッグタイマの閾値超過で前記プロセッサのリセットを行う。 In one embodiment of the present invention, an interrupt receiving means for receiving a processor system clock interrupt;
A driver existence confirmation means for generating a WDT existence confirmation request based on the received system clock interrupt;
WDT clear request means for receiving the WDT survival confirmation request and performing a clear request for a watchdog timer in the processor,
The watchdog timer clearing stops when the operation of the driver existence confirmation unit or the WDT clear request unit stops due to a hang of the operating system processing program or the driver program under the operating system, and the processor exceeds the threshold value of the watchdog timer. Reset.

好ましくは、前記ドライバ生存確認手段は、受信したシステムクロック割り込みを基にカウント要求を更に生成し、
前記オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行うシステム監視手段と、
アプリケーションプログラムの初期化を行う初期化制御手段と、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に少なくとも実行中のアプリケーションプログラムの初期化を指示するアプリハング処理手段とを
有する。 Preferably, the driver existence confirmation means further generates a count request based on the received system clock interrupt,
System monitoring means for monitoring the operation of an application program above the operating system and performing a survival notification;
Initialization control means for initializing the application program;
When the count request is received, the hang counter is counted. When the survival notification is received, the count value of the hang counter is reset. When the count value of the hang counter exceeds a threshold value, the initialization control means And an application hang processing means for instructing initialization of an application program being executed at least.

本発明によれば、ＯＳとＯＳ配下のドライバが正常に動作している限り必ずＷＤＴクリアを行って信頼性の高いプログラムハング検出ができる。 According to the present invention, as long as the OS and the drivers under the OS are operating normally, WDT clear is always performed and highly reliable program hang detection can be performed.

更に好ましい例では、ＯＳより上位のアプリのハング時にはＯＳは動作させたままアプリだけをリセットし被疑要因タスクの特定を容易に行うことができる。 In a more preferred example, when an app higher than the OS hangs, only the app can be reset while the OS is operating, and the suspected factor task can be easily identified.

以下、図面に基づいて本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜実施形態＞
図５は、本発明のプログラムハング検出方法を適用したコンピュータ装置の一実施形態のハードウェア構成例を示す。同図中、ＣＰＵ１０，ＲＡＭ１１，ＲＯＭ１２，チップセット１３それぞれは内部バス１４で接続されている。チップセット１３にはフラッシュメモリ１５，ハードディスク装置（ＨＤＤ）１６及び通信回路１７等の周辺装置が接続されている。 <Embodiment>
FIG. 5 shows a hardware configuration example of an embodiment of a computer apparatus to which the program hang detection method of the present invention is applied. In the figure, each of the CPU 10, RAM 11, ROM 12, and chip set 13 is connected by an internal bus 14. Peripheral devices such as a flash memory 15, a hard disk device (HDD) 16, and a communication circuit 17 are connected to the chip set 13.

ハードディスク装置１６にはＯＳ、各種ドライバ、各種アプリ等のソフトウェアが格納されており、これらのソフトウェアはハードディスク装置１６からＲＡＭ１１に転送されてＣＰＵ１０により実行される。 The hard disk device 16 stores software such as an OS, various drivers, and various applications. These software are transferred from the hard disk device 16 to the RAM 11 and executed by the CPU 10.

図６は、本発明のプログラムハング検出方法を適用したコンピュータ装置の一実施形態の構成例を示す。図７は、図６においてＯＳより上位のアプリのプログラムハング検出部を抽出した構成例を示す。図８は、図６においてＯＳ処理及びＯＳ配下のドライバのプログラムハング検出部を抽出した構成例を示す。 FIG. 6 shows a configuration example of an embodiment of a computer apparatus to which the program hang detection method of the present invention is applied. FIG. 7 shows a configuration example in which a program hang detection unit of an application higher than the OS in FIG. 6 is extracted. FIG. 8 shows a configuration example in which the OS process and the program hang detection unit of the driver under the OS in FIG. 6 are extracted.

図６及び図７において、ＴＩＣ（ＴｉｍｅＩｎｔｅｒｖａｌＣｏｕｎｔｅｒ）割り込み受信機能部（ハンドラ）２１は、プロセッサ３０内のシステムタイマ３３から供給されるシステムクロック割り込みを受信して、これをＴＩＣドライバ生存確認機能部２２に供給する。ＴＩＣ割り込み受信機能部２１とＴＩＣドライバ生存確認機能部２２は最も優先度の高い処理である。 6 and 7, a TIC (Time Interval Counter) interrupt receiving function unit (handler) 21 receives a system clock interrupt supplied from a system timer 33 in the processor 30 and receives it as a TIC driver existence confirmation function unit. 22 is supplied. The TIC interrupt reception function unit 21 and the TIC driver existence confirmation function unit 22 are processes with the highest priority.

ＴＩＣドライバ生存確認機能部（ドライバ）２２は、ＴＩＣ割り込み受信機能部２１からのシステムクロック割り込みを基にＷＤＴの生存確認要求としてのデクリメント要求（所定周期の信号）を生成してアプリハングドライバ２３に対して供給すると共に、生存通知としてのＷＤＴクリア要求（例えばデクリメント要求と同一周期の信号）を生成してＷＤＴドライバ２４に対して供給する。また、ＴＩＣドライバ生存確認機能部２２はシステムクロックをＯＳ４０に供給する。 The TIC driver existence confirmation function unit (driver) 22 generates a decrement request (a signal with a predetermined cycle) as a WDT existence confirmation request based on the system clock interrupt from the TIC interrupt reception function unit 21 and sends it to the application hang driver 23. The WDT clear request (for example, a signal having the same cycle as the decrement request) is generated and supplied to the WDT driver 24. Further, the TIC driver existence confirmation function unit 22 supplies a system clock to the OS 40.

ＷＤＴドライバ２４は、ＯＳ４０内の各種ＯＳ処理４１Ａ，４１Ｂ，４１Ｃ等、及びＯＳ配下のドライバ４２Ａ，４２Ｂ，４２Ｃ等の生存を確認するものである。ドライバ４２Ａ，４２Ｂ，４２Ｃ等は例えば通信回路１７等の周辺装置を制御するプログラムである。ＷＤＴドライバ２４はＴＩＣドライバ生存確認機能部２２からＷＤＴクリア要求を受信すると、プロセッサ３０のＷＤＴ３１のカウント値を初期値に設定するＷＤＴクリア（生存通知）を行う。また、ＷＤＴ３１からハングアップ通知を受けると、例えば不摘発性メモリであるハードディスク装置１６に割付けられたカーネルログエリア３５にハング要因の書き込みを行う。 The WDT driver 24 confirms the existence of various OS processes 41A, 41B, 41C, etc. in the OS 40 and the drivers 42A, 42B, 42C, etc. under the OS. The drivers 42A, 42B, 42C and the like are programs that control peripheral devices such as the communication circuit 17, for example. When the WDT driver 24 receives the WDT clear request from the TIC driver survival confirmation function unit 22, the WDT driver 24 performs WDT clear (survival notification) for setting the count value of the WDT 31 of the processor 30 to an initial value. When a hang-up notification is received from the WDT 31, for example, the hang factor is written in the kernel log area 35 allocated to the hard disk device 16 that is an unoccupied memory.

プロセッサ３０は図５のＣＰＵ１０に対応している。ＷＤＴ３１はシステムタイマ３３からのシステムクロックをカウント（カウントダウン）し、カウント値（ＷＤＴ３１内のＷＤＴレジスタに格納）が０となってＷＤＴの閾値を超過すると、ＷＤＴリセット信号をＣＰＵコア３２に供給してＣＰＵコア３２のリセットを行う。なお、カーネルログエリア３５は例えば図５のハードディスク装置１６に設けられている。 The processor 30 corresponds to the CPU 10 in FIG. The WDT 31 counts (counts down) the system clock from the system timer 33. When the count value (stored in the WDT register in the WDT 31) becomes 0 and exceeds the WDT threshold, a WDT reset signal is supplied to the CPU core 32. The CPU core 32 is reset. The kernel log area 35 is provided, for example, in the hard disk device 16 of FIG.

図６及び図８において、アプリハングドライバ２３は、ＯＳ４０より上位のアプリ５０Ａ，５０Ｂ，５０Ｃ等の生存を確認するものである。アプリハングドライバ２３はＴＩＣドライバ生存確認機能部２２からのデクリメント要求を受信するとハングカウンタ２５をデクリメントする。 6 and 8, the application hang driver 23 is for confirming the survival of the applications 50 A, 50 B, 50 C, etc. higher than the OS 40. The application hang driver 23 decrements the hang counter 25 when receiving a decrement request from the TIC driver existence confirmation function unit 22.

また、ＯＳ４０より上位のシステム監視機能部５１からアプリ５０Ａ，５０Ｂ，５０Ｃ等の生存通知（カウンタクリア要求）を受信した場合にハングカウンタ２５を初期値（閾値）にリセットし、また、ハングカウンタ２５が閾値を超えた場合（ハングカウンタ２５のカウント値が０となったとき）に閾値超過通知をハング対象アプリＩＤ（ハングアップしたアプリの識別子）と共に初期化制御アプリ５２に送信するか、又は、自律リセット信号を生成してＣＰＵコア３２に送信する。 Further, when a survival notification (counter clear request) of the applications 50A, 50B, 50C, etc. is received from the system monitoring function unit 51 higher than the OS 40, the hang counter 25 is reset to the initial value (threshold value). When the threshold value exceeds the threshold value (when the count value of the hang counter 25 becomes 0), the notification of exceeding the threshold value is transmitted to the initialization control application 52 together with the hang target application ID (identifier of the hanged app), or An autonomous reset signal is generated and transmitted to the CPU core 32.

ハングカウンタ２５は、ＲＡＭ１１上に一又は複数設けられる。例えば第１、第２、第３のハングカウンタ２５ａ，２５ｂ，２５ｃを設けた場合の各閾値（各閾値は時間を表す）は、次のように設定される。 One or a plurality of hang counters 25 are provided on the RAM 11. For example, each threshold value (each threshold value represents time) when the first, second, and third hang counters 25a, 25b, and 25c are provided is set as follows.

ハングカウンタ２５ａの閾値＜ハングカウンタ２５ｂの閾値＜ハングカウンタ２５ｃの閾値＜ＷＤＴの閾値
なお、ハングカウンタ２５が一つの場合、ハングカウンタ２５の閾値は上記ハングカウンタ２５ａの閾値と同じである。 The threshold of the hang counter 25a <the threshold of the hang counter 25b <the threshold of the hang counter 25c <the threshold of WDT When the hang counter 25 is one, the threshold of the hang counter 25 is the same as the threshold of the hang counter 25a.

ハングカウンタ２５ａ，２５ｂ，２５ｃはシステム立ち上げ時に初期値（閾値）を設定され、アプリハングドライバ２３からデクリメントされる。 The hang counters 25a, 25b, and 25c are set to initial values (threshold values) when the system is started up, and are decremented from the application hang driver 23.

アプリハングドライバ２３は、ハングカウンタ２５が複数の場合、ハングカウンタ２５ａの閾値超過通知があればハング対象のアプリ（例えば５０Ａ）だけのリセットを初期化制御アプリ５２に指示し、ハング回数＝１とする。なお、ハング回数はＲＡＭ１１に保持され初期値は０である。ハング回数＝１のときハングカウンタ２５ｂの閾値超過通知があればハング対象のアプリが属するアプリグループ（例えば５０Ａ，５０Ｂ）だけのリセットを初期化制御アプリ５２に指示し、ハング回数＝２とする。ハング回数＝２のときハングカウンタ２５ｃの閾値超過通知があれば自律リセット信号を生成してＣＰＵコア３２に送信する。 In the case where there are a plurality of hang counters 25, the application hang driver 23 instructs the initialization control application 52 to reset only the hang target application (for example, 50A) if the hang counter 25a is in excess of the threshold value. To do. The number of hangs is held in the RAM 11 and the initial value is 0. When the hang count = 1, if there is a threshold excess notification of the hang counter 25b, the initialization control application 52 is instructed to reset only the application group (for example, 50A, 50B) to which the hang target application belongs, and the hang count = 2. When the number of hangs = 2, if there is a threshold excess notification of the hang counter 25c, an autonomous reset signal is generated and transmitted to the CPU core 32.

なお、ハングカウンタ２５が一つの場合、アプリハングドライバ２３は閾値超過通知を最初に受信したとき、ハング対象のアプリ（例えば５０Ａ）のハング回数＝１としてＲＡＭ１１に保持し、ハング対象のアプリ（例えば５０Ａ）だけのリセットを初期化制御アプリ５２に指示する。次に、アプリ５０Ａがハングするとハング回数＝２とし、ハング対象のアプリが属するアプリグループ（例えば５０Ａ，５０Ｂ）だけのリセットを初期化制御アプリ５２に指示する。更に、アプリ５０Ａがハングするとハング回数＝３とし、全てのアプリ（例えば５０Ａ，５０Ｂ，５０ｃ）のリセットを初期化制御アプリ５２に指示し、更に、アプリ５０Ａがハングするとハング回数＝４とし、自律リセット信号を生成してＣＰＵコア３２に送信する構成とする。 When there is only one hang counter 25, the application hang driver 23 stores the hang target application (for example, 50A) in the RAM 11 as the number of hangs of the hang target application (for example, 50A) = 1 when the hang counter 25 is first received. The initialization control application 52 is instructed to reset only 50A). Next, when the application 50A hangs, the number of hangs is set to 2, and the initialization control application 52 is instructed to reset only the application group (for example, 50A, 50B) to which the application to be hung belongs. Further, when the application 50A hangs, the number of hangs is set to 3, and the reset control application 52 is instructed to reset all applications (for example, 50A, 50B, 50c). The reset signal is generated and transmitted to the CPU core 32.

なお、上記実施形態ではハングカウンタ２５をデクリメント（ダウンカウント）しているが、インクリメント（アップカウント）して閾値超過を判定するよう構成しても良い。 In the above embodiment, the hang counter 25 is decremented (down-counted), but it may be configured to increment (up-count) and determine whether the threshold is exceeded.

システム監視アプリ５１は、ＯＳ４０より上位の全てのアプリ５０Ａ，５０Ｂ，５０Ｃの生存を監視し、定期的に生存通知としてのカウンタクリア要求を生成してアプリハングドライバ２３に供給する。システム監視アプリ５１は、ＯＳ上のアプリケーションで最も優先度が低い例えばアイドルタスクであり、ＣＰＵコア３１を占有して時分割で実行されるアプリ５０Ａ，５０Ｂ，５０Ｃそれぞれが切り替わる際にカウンタクリア要求を生成する。アプリ５０Ａ，５０Ｂ，５０Ｃのいずれかが例えば無限ループに陥りハングした場合には、ＣＰＵコア３２の使用権がシステム監視アプリ５１（アイドルタスク）に渡らないために、システム監視アプリ５１は生存通知（カウンタクリア要求）を出さなくなる。 The system monitoring application 51 monitors the survival of all the applications 50A, 50B, 50C above the OS 40, periodically generates a counter clear request as a survival notification, and supplies it to the application hang driver 23. The system monitoring application 51 is, for example, an idle task having the lowest priority among the applications on the OS, and issues a counter clear request when the applications 50A, 50B, and 50C that occupy the CPU core 31 and are executed in a time division manner are switched. Generate. If any of the applications 50A, 50B, 50C falls into an infinite loop and hangs, for example, the right to use the CPU core 32 does not pass to the system monitoring application 51 (idle task), so the system monitoring application 51 notifies the existence notification ( (Counter clear request) will not be issued.

初期化制御アプリ５２は、アプリハングドライバ２３からリセット対象アプリの指示を含む閾値超過通知を受信した場合、該当するアプリの初期化（リセット）を行う。 When the initialization control application 52 receives a threshold excess notification including an instruction of the reset target application from the application hang driver 23, the initialization control application 52 initializes (resets) the corresponding application.

＜システム立ち上げ時＞
システム立ち上げ時には、ハングカウンタ２５ａ〜２５ｃの初期設定、すなわち、閾値の設定を行う。これ以降、アプリハングドライバ２３がハングカウンタ２５ａ〜２５ｃののカウントダウンを開始する。 <At system startup>
When the system is started up, the hang counters 25a to 25c are initialized, that is, a threshold value is set. Thereafter, the application hang driver 23 starts counting down the hang counters 25a to 25c.

また、システム立ち上げ時にＷＤＴ３１のカウント値の初期設定、すなわち、閾値の設定を行う。これ以降、ＷＤＴ３１のカウントダウンが始まる。 Also, the WDT 31 count value is initially set, that is, a threshold value is set when the system is started up. Thereafter, the countdown of the WDT 31 starts.

＜通常動作時ＯＳ及びＯＳ配下のドライバのハング監視＞
図９は、図６に示すコンピュータ装置の通常動作時の動作シーケンスを示す。 <Hang monitoring of OS and driver under OS during normal operation>
FIG. 9 shows an operation sequence during normal operation of the computer apparatus shown in FIG.

図９において、プロセッサ３０からのシステムクロック割り込みをＴＩＣ割り込み受信機能部２１で受信する（ステップＳ０）。ＴＩＣドライバ生存確認機能部２２からＷＤＴドライバ２４に対し、ＷＤＴクリア要求を出す（ステップＳ１）。 In FIG. 9, the TIC interrupt reception function unit 21 receives a system clock interrupt from the processor 30 (step S0). The TIC driver existence confirmation function unit 22 issues a WDT clear request to the WDT driver 24 (step S1).

ＷＤＴドライバ２４は生存通知（ＷＤＴクリア要求）を受け取ると、プロセッサ３０のＷＤＴ３１を初期値に戻すＷＤＴクリアを出す（ステップＳ３）。これにより、ＷＤＴ３１は初期値からカウントダウンを始める。 When receiving the survival notification (WDT clear request), the WDT driver 24 issues a WDT clear that returns the WDT 31 of the processor 30 to the initial value (step S3). As a result, the WDT 31 starts counting down from the initial value.

＜通常動作時ＯＳより上位のアプリのハング監視＞
図９において、プロセッサ３０からのシステムクロック割り込みをＴＩＣ割り込み受信機能部２１で受信する（ステップＳ０）。ＴＩＣドライバ生存確認機能部２２からアプリハングドライバ２３に対し、ハングカウンタのデクリメント要求を出す（ステップＳ２）。アプリハングドライバ２３はアプリハング検出用のハングカウンタ２５ａ〜２５ｃをデクリメントする（ステップＳ４）。この処理はＯＳ配下で定期的に行われる。 <Hang monitoring of apps higher than OS during normal operation>
In FIG. 9, the TIC interrupt reception function unit 21 receives a system clock interrupt from the processor 30 (step S0). The TIC driver survival confirmation function unit 22 issues a hang counter decrement request to the application hang driver 23 (step S2). The application hang driver 23 decrements the hang counters 25a to 25c for detecting application hangs (step S4). This process is periodically performed under the OS.

一方、ＯＳより上位のシステム監視アプリ５１はアプリハングドライバ２３に対しＯＳ４０を介して生存通知を出す（ステップＳ５，Ｓ６）。アプリハングドライバ２３は生存通知を受信すると、アプリハング検出用のハングカウンタ２５ａ〜２５ｃを初期値に戻す（ステップＳ７）。ハングカウンタ２５ａ〜２５ｃは初期値からデクリメントを始める。 On the other hand, the system monitoring application 51 higher than the OS issues a survival notification to the application hang driver 23 via the OS 40 (steps S5 and S6). When the application hang driver 23 receives the survival notification, it returns the hang counters 25a to 25c for detecting application hangs to the initial values (step S7). The hang counters 25a to 25c start to decrement from the initial value.

＜ＯＳ処理及びＯＳ配下のドライバのハング検出＞
図１０は、図６に示すコンピュータ装置のドライバのハング検出時の動作シーケンスを示す。 <OS processing and driver hang detection under OS>
FIG. 10 shows an operation sequence when detecting a hang of the driver of the computer apparatus shown in FIG.

図１０において、ＯＳ処理４１Ａ〜４１Ｃ又はＯＳ配下で動作するドライバ４２Ａ〜４２Ｃにおいて、例えばドライバ４２Ａが無限ループに陥りハングした場合（ステップＳ１１）、ＣＰＵコア３２の使用権がＴＩＣ生存確認機能２２又はＷＤＴドライバ２４に渡らないことになる。なお、ＴＩＣ割り込み受信機能部２１までは動作する。 In FIG. 10, in the OS processing 41A to 41C or the drivers 42A to 42C operating under the OS, for example, when the driver 42A falls into an infinite loop and hangs (step S11), the right to use the CPU core 32 is the TIC survival confirmation function 22 or It does not pass to the WDT driver 24. The operation up to the TIC interrupt reception function unit 21 is also performed.

このため、ＷＤＴドライバ２４は生存通知（ＷＤＴクリア要求）を受け取れなくなるか、受け取っても処理できなくなるので、プロセッサ３０のＷＤＴ３１のカウント値を初期値に戻すＷＤＴクリア要求が出せなくなる。また、アプリハングドライバ２３も同じ理由で動作しなくなるので、アプリハングカウンタ２５ａ〜２５ｃのデクリメントも停止する。また、ＯＳ配下のドライバ４２Ａ〜４２Ｃ及びＯＳ処理４１Ａ〜４１Ｃも停止する。 For this reason, the WDT driver 24 cannot receive the survival notification (WDT clear request) or cannot process it even if it is received. Therefore, the WDT clear request for returning the count value of the WDT 31 of the processor 30 to the initial value cannot be issued. Further, since the application hang driver 23 does not operate for the same reason, the decrement of the application hang counters 25a to 25c is also stopped. Further, the drivers 42A to 42C and the OS processes 41A to 41C under the OS are also stopped.

やがてプロセッサ３０の機能でＷＤＴ３１のカウントダウンが進み（ステップＳ１２）、カウント値が０になり（ステップＳ１３）、ＣＰＵコア３１に対しリセット信号が発生し、ハードウェアからのシステムリセットとなる（ステップＳ１４）。 Eventually, the countdown of the WDT 31 proceeds by the function of the processor 30 (step S12), the count value becomes 0 (step S13), a reset signal is generated for the CPU core 31, and the system is reset from the hardware (step S14). .

このように、ＯＳ処理とＯＳ配下のドライバが正常に動作している限り必ずＷＤＴクリアを行って信頼性の高いプログラムハング検出ができる。 As described above, as long as the OS process and the driver under the OS are operating normally, WDT clear can be performed without fail to detect a highly reliable program hang.

ＯＳ処理やＯＳ配下のドライバの生存通知を、ＯＳやドライバそれぞれの処理内容に依存してＷＤＴクリアするかしないかを決めるのではなく、一律にシステムクロックからＷＤＴクリアを出すことで、ＣＰＵからのシステムクロックがＯＳやドライバには無関係に定期的にトリガを発動し、ＯＳとドライバ処理が正常に動作している限り必ずＷＤＴクリアのトリガが与えられ、信頼性の高いプログラムハング検出ができる。 Rather than deciding whether or not to clear the OS process and the survival notification of the driver under the OS depending on the processing contents of the OS and the driver, the WDT clear is issued from the system clock in a uniform manner. The system clock periodically triggers regardless of the OS and the driver, and as long as the OS and the driver process are operating normally, a trigger for WDT clear is always given, so that a highly reliable program hang can be detected.

＜ＯＳより上位のアプリのハング検出＞
図１１は、図６に示すコンピュータ装置のアプリのハング検出時の動作シーケンスを示す。 <Hang detection of apps higher than OS>
FIG. 11 shows an operation sequence at the time of detecting an application hang in the computer apparatus shown in FIG.

図１１において、ＯＳ４０より上位のアプリ５０Ａ〜５０Ｃのうち、例えばアプリ５０Ａが無限ループに陥りハングした場合（ステップＳ２１）、ＣＰＵコア３２の使用権がシステム監視アプリ５１に渡らないことになり、システム監視アプリ５１からの生存通知（カウンタクリア要求）が出なくなる。 In FIG. 11, among the applications 50A to 50C higher than the OS 40, for example, when the application 50A falls into an infinite loop and hangs (step S21), the right to use the CPU core 32 is not passed to the system monitoring application 51. The survival notification (counter clear request) from the monitoring application 51 is not issued.

ＷＤＴ３１の閾値はハングカンウンタ２５ａ〜２５ｃの閾値より長い時間を設定しているのでＷＤＴ３１が先に閾値超過することはない。これにより、ＯＳ処理４１Ａ〜４１Ｃ又はＯＳ配下のドライバ４２Ａ〜４２Ｃでハングが発生しない限り、アプリハングドライバ２３はＴＩＣドライバ生存確認機能部２２からのデクリメント要求を受けるので、ハングカウンタ２５ａ〜２５ｃのデクリメントは行われ続ける（ステップＳ２２，Ｓ２３）。 Since the threshold of WDT 31 is set to be longer than the threshold of hang counters 25a to 25c, WDT 31 never exceeds the threshold first. As a result, the application hang driver 23 receives a decrement request from the TIC driver existence confirmation function unit 22 unless the hang occurs in the OS processing 41A to 41C or the drivers 42A to 42C under the OS. Is continued (steps S22 and S23).

システム監視アプリ５１からの生存通知（ハングカウンタクリア要求）が出なくなることで、第１のハングカウンタ２５ａが閾値を超えてカウント値が０になる。これをアプリハングドライバ２３が検出した場合、ＲＡＭ１１に保持されているＯＳ４０の管理情報から実行中もしくは実行権を保持したままのアプリ５０Ａを見つけ出し（ステップＳ２４）、不摘発性メモリのカーネルログエリア３５にハングしているアプリ５０Ａのタスク詳細情報とハング回数をセーブし（ステップＳ２５）、初期化制御アプリ５２に対して、第１のハングカンウンタ２５ａの閾値超過とハングしているアプリ５０Ａのタスク詳細情報とを通知する（ステップＳ２６，Ｓ２７）。 When the survival notification (hang counter clear request) is not issued from the system monitoring application 51, the first hang counter 25a exceeds the threshold value and the count value becomes zero. If the application hang driver 23 detects this, the application 50A being executed or holding the execution right is found from the management information of the OS 40 held in the RAM 11 (step S24), and the kernel log area 35 of the unexpected memory is detected. The task detailed information and the number of hangs of the application 50A that is hung are saved (step S25), and the threshold value of the first hang counter 25a is exceeded with respect to the initialization control application 52, and the task of the hung application 50A Detailed information is notified (steps S26 and S27).

初期化制御アプリ５２は、ハングカンウンタ２５ａの閾値超過とハングしているアプリ５０Ａのタスク詳細情報を受け取り、ハングカンウンタ２５ａの閾値超過に該当する例えばアプリ５０Ａのみの再初期化を行う（ステップＳ２８）。 The initialization control application 52 receives the detailed task information of the hang counter 25a exceeding the threshold and the hang counter 25a, and performs re-initialization of only the application 50A corresponding to the hang counter 25a exceeding the threshold (step) S28).

上記の再初期化でアプリ５０Ａのハングが解決した場合は、ＣＰＵ使用権がシステム監視アプリ５１に渡ることになるのでシステム監視アプリ５１からの生存通知（ハングカウンタクリア要求）が出るようになり（ステップＳ２９）、アプリハングドライバ２３はアプリハング検出用のハングカウンタ２５ａ〜２５ｃを初期値に戻すので、再度監視が始まる。 When the hang of the application 50A is resolved by the above re-initialization, the CPU usage right is passed to the system monitoring application 51, so that a survival notification (hang counter clear request) is issued from the system monitoring application 51 ( In step S29), the application hang driver 23 resets the hang counters 25a to 25c for detecting application hangs to the initial values, so that monitoring starts again.

上記の再初期化でアプリ５０Ａのハングが解決しない場合は、ＣＰＵ使用権がシステム監視アプリ５１に渡らないことになるのでシステム監視アプリ５１からの生存通知が出ない。 If the hang of the application 50 A is not resolved by the above re-initialization, the CPU usage right does not pass to the system monitoring application 51, so that no survival notification is issued from the system monitoring application 51.

システム監視アプリ５１からの生存通知が出なくなることで、第２のハングカウンタ２５ｂが閾値を超えてカウント値が０になる。これをアプリハングドライバ２３が検出した場合、ＯＳの管理情報から実行中もしくは実行権を保持したままのアプリ５０Ａを見つけ出し、カーネルログエリア３５にハングしているアプリ５０Ａのタスク詳細情報とハング回数をセーブし、初期化制御アプリ５２に対して、第２のハングカウンタ２５ｂの閾値超過をハングしているアプリ５０Ａのタスク詳細情報と共に通知する。 Since the survival notification from the system monitoring application 51 is not issued, the second hang counter 25b exceeds the threshold value and the count value becomes zero. When this is detected by the application hang driver 23, the application 50A that is being executed or has the execution right held is found from the management information of the OS, and the task detailed information and the number of hangs of the application 50A that is hung in the kernel log area 35 are obtained. Save and notify the initialization control application 52 together with the task detail information of the application 50A that is hung when the second hang counter 25b exceeds the threshold value.

初期化制御アプリ５２はハングカウンタ２５ｂの閾値超過とハングしているアプリ５０Ａのタスク詳細情報を受け取り、ハングカンウンタ２５ｂの閾値超過に該当する例えばアプリ５０Ａの属するグループのアプリ５０Ａ，５０Ｂの再初期化を行う。 The initialization control application 52 receives the task detail information of the hang counter 25b exceeding the threshold value and the hung application 50A, and re-initializes the applications 50A and 50B of the group to which the application 50A belongs, for example, corresponding to the hang counter 25b exceeding the threshold value. To do.

この再初期化でアプリ５０Ａのハングが解決した場合は、ＣＰＵ使用権がシステム監視アプリ５１に渡ることになるのでシステム監視アプリ５１からの生存通知が出るようになり、アプリハングドライバ２３はアプリハング検出用のハングカウンタ２５ａ〜２５ｃを初期値に戻すので、再度監視が始まる。 When the hang of the application 50A is resolved by this re-initialization, the CPU usage right is passed to the system monitoring application 51, so that a survival notification is issued from the system monitoring application 51, and the application hang driver 23 sets the application hang. Since the detection hang counters 25a to 25c are returned to the initial values, monitoring starts again.

この再初期化で、アプリ５０Ａのハングが解決しない場合は、ＣＰＵ使用権がシステム監視アプリ５１に渡らないことになるのでシステム監視アプリ５１からの生存通知が出ない。 If the re-initialization does not solve the hang of the application 50 A, the CPU usage right does not pass to the system monitoring application 51, so that no survival notification is issued from the system monitoring application 51.

システム監視アプリ５１からの生存通知が出なくなることで、第３のハングカウンタ２５ｃが閾値を超えてカウント値が０になる。これをアプリハングドライバ２３が検出した場合、ＯＳの管理情報から実行中もしくは実行権を保持したままのアプリ５０Ａを見つけ出し、カーネルログエリア３５にハングしているアプリ５０Ａのタスク詳細情報とハング回数をセーブし、初期化制御アプリ５２に対して、第３のハングカンウンタ２５ｃの閾値超過をハングしているアプリ５０Ａのタスク詳細情報と共に通知する。 When the survival notification from the system monitoring application 51 is not issued, the third hang counter 25c exceeds the threshold value and the count value becomes zero. When this is detected by the application hang driver 23, the application 50A that is being executed or has the execution right held is found from the management information of the OS, and the task detailed information and the number of hangs of the application 50A that is hung in the kernel log area 35 are obtained. Save and notify the initialization control application 52 of the third hang counter 25c with the task detail information of the hanged application 50A that exceeds the threshold value.

第３のハングカンウンタ２５ｃの閾値超過を最終エスカレーションフェーズとして、アプリハングドライバ２３はＣＰＵコア３２に対して自律リセット信号を出し、システム全体のリセットを行う。 With the third hang counter 25c exceeding the threshold as the final escalation phase, the application hang driver 23 issues an autonomous reset signal to the CPU core 32 to reset the entire system.

このように、ＯＳやＯＳ配下のドライバのハングの場合はハードウェアからのＷＤＴリセットでシステムリセットさせ、ＯＳより上位のアプリのハング時にはＯＳは動作させたままアプリだけをリセットすることで、ＯＳが動作していることから、ハングしているアプリのタスク詳細情報とハング回数を不揮発性メモリのカーネルログエリア３５にセーブすることができ、被疑要因タスクの特定を容易に行うことができる。
（付記１）
プロセッサのシステムクロック割り込みを受信する割り込み受信手段と、
受信したシステムクロック割り込みを基にＷＤＴ生存確認要求を生成するドライバ生存確認手段と、
前記ＷＤＴ生存確認要求を受信して前記プロセッサ内のウォッチドッグタイマのクリア要求を行うＷＤＴクリア要求手段とを有し、
オペレーティングシステムの処理プログラム又はオペレーティングシステム配下のドライバプログラムのハングによる前記ドライバ生存確認手段又は前記ＷＤＴクリア要求手段の動作停止で前記ウォッチドッグタイマのクリアが停止し、前記ウォッチドッグタイマの閾値超過で前記プロセッサのリセットを行うことを特徴とするコンピュータ装置。
（付記２）
付記１記載のコンピュータ装置において、
前記ドライバ生存確認手段は、受信したシステムクロック割り込みを基にカウント要求を更に生成し、
前記オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行うシステム監視手段と、
アプリケーションプログラムの初期化を行う初期化制御手段と、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に少なくとも実行中のアプリケーションプログラムの初期化を指示するアプリハング処理手段とを
有することを特徴とするコンピュータ装置。
（付記３）
付記２記載のコンピュータ装置において、
閾値の異なる複数のハングカウンタを設けたことを特徴とするコンピュータ装置。
（付記４）
付記３記載のコンピュータ装置において、
前記アプリハング処理手段は、前記複数のハングカウンタのうち第１のハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に実行中のアプリケーションプログラムの初期化を指示することを特徴とするコンピュータ装置。
（付記５）
付記４記載のコンピュータ装置において、
前記アプリハング処理手段は、前記複数のハングカウンタのうち第２のハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に実行中のアプリケーションプログラムと同一グループのアプリケーションプログラムの初期化を指示することを特徴とするコンピュータ装置。
（付記６）
付記５記載のコンピュータ装置において、
前記アプリハング処理手段は、前記複数のハングカウンタのうち第３のハングカウンタのカウント値が閾値を超えたとき、前記プロセッサのリセットを行うことを特徴とするコンピュータ装置。
（付記７）
プロセッサのシステムクロック割り込みを受信する割り込み受信手段と、
受信したシステムクロック割り込みを基にカウント要求を生成するドライバ生存確認手段と、
オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行うシステム監視手段と、
アプリケーションプログラムの初期化を行う初期化制御手段と、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に少なくとも実行中のアプリケーションプログラムの初期化を指示するアプリハング処理手段とを
有することを特徴とするコンピュータ装置。
（付記８）
プロセッサのシステムクロック割り込みを受信し、
受信したシステムクロック割り込みを基にＷＤＴ生存確認要求を生成し、
前記ＷＤＴ生存確認要求を受信して前記プロセッサ内のウォッチドッグタイマのクリア要求を行い、
オペレーティングシステムの処理プログラム又はオペレーティングシステム配下のドライバプログラムのハングによる前記ＷＤＴ生存確認要求の生成又は前記ウォッチドッグタイマのクリア要求の停止で前記ウォッチドッグタイマのクリアが停止し、前記ウォッチドッグタイマの閾値超過で前記プロセッサのリセットを行うことを特徴とするプログラムハング検出方法。
（付記９）
付記８記載のプログラムハング検出方法において、
前記受信したシステムクロック割り込みを基にカウント要求を更に生成し、
前記オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行い、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記アプリケーションプログラムの初期化を指示する、
ことを特徴とするプログラムハング検出方法。
（付記１０）
プロセッサのシステムクロック割り込みを受信し、
受信したシステムクロック割り込みを基にカウント要求を生成し、
オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行い、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記アプリケーションプログラムの初期化を指示する、
ことを特徴とするプログラムハング検出方法。
（付記１１）
付記２記載のコンピュータ装置において、
前記ハングカウンタのカウント値の閾値は、前記ウォッチドッグタイマの閾値より小さいことを特徴とするコンピュータ装置。
（付記１２）
付記２記載のコンピュータ装置において、
前記アプリハング処理手段は、前記ハングカウンタのカウント値が閾値を超えたとき、ハング回数をカウントし、ハング回数に応じて前記初期化制御手段に異なる初期化指示を行うことを特徴とするコンピュータ装置。 In this way, if the OS or driver under the OS hangs, the system is reset by a WDT reset from the hardware, and when the app higher than the OS hangs, only the application is reset while the OS is operating, so that the OS Since it is operating, it is possible to save the task detail information and the number of hangs of the hung application in the kernel log area 35 of the nonvolatile memory, and the suspicious factor task can be easily identified.
(Appendix 1)
An interrupt receiving means for receiving a system clock interrupt of the processor;
A driver existence confirmation means for generating a WDT existence confirmation request based on the received system clock interrupt;
WDT clear request means for receiving the WDT survival confirmation request and performing a clear request for a watchdog timer in the processor,
The watchdog timer clearing stops when the operation of the driver existence confirmation unit or the WDT clear request unit stops due to a hang of the operating system processing program or the driver program under the operating system, and the processor exceeds the threshold value of the watchdog timer. The computer apparatus characterized by resetting.
(Appendix 2)
In the computer apparatus according to attachment 1,
The driver existence confirmation means further generates a count request based on the received system clock interrupt,
System monitoring means for monitoring the operation of an application program above the operating system and performing a survival notification;
Initialization control means for initializing the application program;
When the count request is received, the hang counter is counted. When the survival notification is received, the count value of the hang counter is reset. When the count value of the hang counter exceeds a threshold value, the initialization control means A computer apparatus comprising: an application hang processing means for instructing at least initialization of an application program being executed.
(Appendix 3)
In the computer apparatus according to attachment 2,
A computer apparatus comprising a plurality of hang counters having different threshold values.
(Appendix 4)
In the computer apparatus according to attachment 3,
The application hang processing means instructs the initialization control means to initialize an application program being executed when a count value of a first hang counter of the plurality of hang counters exceeds a threshold value. Computer equipment to do.
(Appendix 5)
In the computer device according to attachment 4,
The application hang processing means initializes an application program in the same group as the application program being executed to the initialization control means when a count value of a second hang counter of the plurality of hang counters exceeds a threshold value. A computer device characterized by indicating.
(Appendix 6)
In the computer device according to attachment 5,
The computer apparatus, wherein the application hang processing means resets the processor when a count value of a third hang counter of the plurality of hang counters exceeds a threshold value.
(Appendix 7)
An interrupt receiving means for receiving a system clock interrupt of the processor;
Driver survival confirmation means for generating a count request based on the received system clock interrupt,
System monitoring means for monitoring the operation of an application program higher than the operating system and performing a survival notification;
Initialization control means for initializing the application program;
When the count request is received, the hang counter is counted. When the survival notification is received, the count value of the hang counter is reset. When the count value of the hang counter exceeds a threshold value, the initialization control means A computer apparatus comprising: an application hang processing means for instructing at least initialization of an application program being executed.
(Appendix 8)
Receive the processor system clock interrupt,
Generate a WDT survival confirmation request based on the received system clock interrupt,
Upon receiving the WDT existence confirmation request, a clear request for a watchdog timer in the processor is performed,
The watchdog timer clearing stops when the WDT existence confirmation request is generated or the watchdog timer clear request is stopped due to the hang of the operating system processing program or the driver program under the operating system, and the watchdog timer threshold is exceeded. A program hang detection method comprising: resetting the processor.
(Appendix 9)
In the program hang detection method according to attachment 8,
Further generating a count request based on the received system clock interrupt,
Monitor the operation of the application program above the operating system to give a survival notification,
The hang counter is counted when the count request is received, the count value of the hang counter is reset when the survival notification is received, and the application program is initialized when the count value of the hang counter exceeds a threshold value Instruct,
A program hang detection method characterized by the above.
(Appendix 10)
Receive the processor system clock interrupt,
Generate a count request based on the received system clock interrupt,
Monitors the operation of application programs higher than the operating system and sends a survival notification.
The hang counter is counted when the count request is received, the count value of the hang counter is reset when the survival notification is received, and the application program is initialized when the count value of the hang counter exceeds a threshold value Instruct,
A program hang detection method characterized by the above.
(Appendix 11)
In the computer apparatus according to attachment 2,
The computer apparatus according to claim 1, wherein a threshold value of the count value of the hang counter is smaller than a threshold value of the watchdog timer.
(Appendix 12)
In the computer apparatus according to attachment 2,
The application hang processing means counts the number of hangs when the count value of the hang counter exceeds a threshold value, and issues a different initialization instruction to the initialization control means according to the number of hangs. .

従来のプログラムハング検出方法を用いたコンピュータ装置の一例の構成図である。It is a block diagram of an example of the computer apparatus using the conventional program hang detection method. 従来方法の問題点を説明するための図である。It is a figure for demonstrating the problem of the conventional method. 従来方法の問題点を説明するための図である。It is a figure for demonstrating the problem of the conventional method. プログラムハング発生時の処理の流れを示す図である。It is a figure which shows the flow of a process at the time of program hang occurrence. 本発明のプログラムハング検出方法を適用したコンピュータ装置の一実施形態のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of one Embodiment of the computer apparatus to which the program hang detection method of this invention is applied. 本発明のプログラムハング検出方法を適用したコンピュータ装置の一実施形態の構成例を示す図である。It is a figure which shows the structural example of one Embodiment of the computer apparatus to which the program hang detection method of this invention is applied. ＯＳより上位のアプリのプログラムハング検出部を抽出した構成例を示す図である。It is a figure which shows the structural example which extracted the program hang detection part of the high-order application from OS. ＯＳ処理及びＯＳ配下のドライバのプログラムハング検出部を抽出した構成例を示す図である。It is a figure which shows the structural example which extracted the program hang detection part of the OS process and the driver under OS. コンピュータ装置の通常動作時の動作シーケンスである。It is an operation | movement sequence at the time of normal operation | movement of a computer apparatus. コンピュータ装置のドライバのハング検出時の動作シーケンスである。It is an operation | movement sequence at the time of the hang detection of the driver of a computer apparatus. コンピュータ装置のアプリのハング検出時の動作シーケンスである。It is an operation | movement sequence at the time of the hang detection of the application of a computer apparatus.

符号の説明Explanation of symbols

１０ＣＰＵ
１１ＲＡＭ
１２ＲＯＭ
１３チップセット
１４内部バス
１５フラッシュメモリ
１６ハードディスク装置
１７通信回路
２１割り込み受信機能部
２２ＴＩＣドライバ生存確認機能部
２３アプリハングドライバ
２４ＷＤＴドライバ
２５ハングカウンタ
３０プロセッサ
３１ＷＤＴ
３２ＣＰＵコア
３３システムタイマ
４０ＯＳ
４１Ａ〜４１ＣＯＳ処理
４２Ａ〜４２Ｃドライバ
５０Ａ〜５０Ｃアプリ
５１システム監視アプリ
５２初期化制御アプリ 10 CPU
11 RAM
12 ROM
13 Chipset 14 Internal Bus 15 Flash Memory 16 Hard Disk Device 17 Communication Circuit 21 Interrupt Reception Function Unit 22 TIC Driver Life Confirmation Function Unit 23 App Hang Driver 24 WDT Driver 25 Hang Counter 30 Processor 31 WDT
32 CPU core 33 System timer 40 OS
41A to 41C OS processing 42A to 42C driver 50A to 50C application 51 system monitoring application 52 initialization control application

Claims

プロセッサのシステムクロック割り込みを受信する割り込み受信手段と、
受信したシステムクロック割り込みを基にＷＤＴ生存確認要求を生成するドライバ生存確認手段と、
前記ＷＤＴ生存確認要求を受信して前記プロセッサ内のウォッチドッグタイマのクリア要求を行うＷＤＴクリア要求手段とを有し、
オペレーティングシステムの処理プログラム又はオペレーティングシステム配下のドライバプログラムのハングによる前記ドライバ生存確認手段又は前記ＷＤＴクリア要求手段の動作停止で前記ウォッチドッグタイマのクリアが停止し、前記ウォッチドッグタイマの閾値超過で前記プロセッサのリセットを行うことを特徴とするコンピュータ装置。 An interrupt receiving means for receiving a system clock interrupt of the processor;
A driver existence confirmation means for generating a WDT existence confirmation request based on the received system clock interrupt;
WDT clear request means for receiving the WDT survival confirmation request and performing a clear request for a watchdog timer in the processor,
The watchdog timer clearing stops when the operation of the driver existence confirmation unit or the WDT clear request unit stops due to a hang of the operating system processing program or the driver program under the operating system, and the processor exceeds the threshold value of the watchdog timer. The computer apparatus characterized by resetting.

請求項１記載のコンピュータ装置において、
前記ドライバ生存確認手段は、受信したシステムクロック割り込みを基にカウント要求を更に生成し、
前記オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行うシステム監視手段と、
アプリケーションプログラムの初期化を行う初期化制御手段と、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に少なくとも実行中のアプリケーションプログラムの初期化を指示するアプリハング処理手段とを
有することを特徴とするコンピュータ装置。 The computer apparatus of claim 1.
The driver existence confirmation means further generates a count request based on the received system clock interrupt,
System monitoring means for monitoring the operation of an application program above the operating system and performing a survival notification;
Initialization control means for initializing the application program;
When the count request is received, the hang counter is counted. When the survival notification is received, the count value of the hang counter is reset. When the count value of the hang counter exceeds a threshold value, the initialization control means A computer apparatus comprising: an application hang processing means for instructing at least initialization of an application program being executed.

請求項２記載のコンピュータ装置において、
閾値の異なる複数のハングカウンタを設けたことを特徴とするコンピュータ装置。 The computer device according to claim 2.
A computer apparatus comprising a plurality of hang counters having different threshold values.

請求項３記載のコンピュータ装置において、
前記アプリハング処理手段は、前記複数のハングカウンタのうち第１のハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に実行中のアプリケーションプログラムの初期化を指示することを特徴とするコンピュータ装置。 The computer apparatus according to claim 3.
The application hang processing means instructs the initialization control means to initialize an application program being executed when a count value of a first hang counter of the plurality of hang counters exceeds a threshold value. Computer equipment to do.

請求項４記載のコンピュータ装置において、
前記アプリハング処理手段は、前記複数のハングカウンタのうち第２のハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に実行中のアプリケーションプログラムと同一グループのアプリケーションプログラムの初期化を指示することを特徴とするコンピュータ装置。 The computer device according to claim 4.
The application hang processing means initializes an application program in the same group as the application program being executed to the initialization control means when a count value of a second hang counter of the plurality of hang counters exceeds a threshold value. A computer device characterized by indicating.

請求項５記載のコンピュータ装置において、
前記アプリハング処理手段は、前記複数のハングカウンタのうち第３のハングカウンタのカウント値が閾値を超えたとき、前記プロセッサのリセットを行うことを特徴とするコンピュータ装置。 The computer device according to claim 5.
The computer apparatus, wherein the application hang processing means resets the processor when a count value of a third hang counter of the plurality of hang counters exceeds a threshold value.

プロセッサのシステムクロック割り込みを受信する割り込み受信手段と、
受信したシステムクロック割り込みを基にカウント要求を生成するドライバ生存確認手段と、
オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行うシステム監視手段と、
アプリケーションプログラムの初期化を行う初期化制御手段と、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記初期化制御手段に少なくとも実行中のアプリケーションプログラムの初期化を指示するアプリハング処理手段とを
有することを特徴とするコンピュータ装置。 An interrupt receiving means for receiving a system clock interrupt of the processor;
Driver survival confirmation means for generating a count request based on the received system clock interrupt,
System monitoring means for monitoring the operation of an application program higher than the operating system and performing a survival notification;
Initialization control means for initializing the application program;
When the count request is received, the hang counter is counted. When the survival notification is received, the count value of the hang counter is reset. When the count value of the hang counter exceeds a threshold value, the initialization control means A computer apparatus comprising: an application hang processing means for instructing at least initialization of an application program being executed.

プロセッサのシステムクロック割り込みを受信し、
受信したシステムクロック割り込みを基にＷＤＴ生存確認要求を生成し、
前記ＷＤＴ生存確認要求を受信して前記プロセッサ内のウォッチドッグタイマのクリア要求を行い、
オペレーティングシステムの処理プログラム又はオペレーティングシステム配下のドライバプログラムのハングによる前記ＷＤＴ生存確認要求の生成又は前記ウォッチドッグタイマのクリア要求の停止で前記ウォッチドッグタイマのクリアが停止し、前記ウォッチドッグタイマの閾値超過で前記プロセッサのリセットを行うことを特徴とするプログラムハング検出方法。 Receive the processor system clock interrupt,
Generate a WDT survival confirmation request based on the received system clock interrupt,
Upon receiving the WDT existence confirmation request, a clear request for a watchdog timer in the processor is performed,
The watchdog timer clearing stops when the WDT existence confirmation request is generated or the watchdog timer clear request is stopped due to the hang of the operating system processing program or the driver program under the operating system, and the watchdog timer threshold is exceeded. A program hang detection method comprising: resetting the processor.

請求項８記載のプログラムハング検出方法において、
前記受信したシステムクロック割り込みを基にカウント要求を更に生成し、
前記オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行い、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記アプリケーションプログラムの初期化を指示する、
ことを特徴とするプログラムハング検出方法。 The method of detecting a program hang according to claim 8.
Further generating a count request based on the received system clock interrupt,
Monitor the operation of the application program above the operating system to give a survival notification,
The hang counter is counted when the count request is received, the count value of the hang counter is reset when the survival notification is received, and the application program is initialized when the count value of the hang counter exceeds a threshold value Instruct,
A program hang detection method characterized by the above.

プロセッサのシステムクロック割り込みを受信し、
受信したシステムクロック割り込みを基にカウント要求を生成し、
オペレーティングシステムより上位のアプリケーションプログラムの動作を監視して生存通知を行い、
前記カウント要求を受信したときハングカウンタのカウントを行い、前記生存通知を受信したとき前記ハングカウンタのカウント値をリセットし、前記ハングカウンタのカウント値が閾値を超えたとき、前記アプリケーションプログラムの初期化を指示する、
ことを特徴とするプログラムハング検出方法。 Receive the processor system clock interrupt,
Generate a count request based on the received system clock interrupt,
Monitors the operation of application programs higher than the operating system and sends a survival notification.
The hang counter is counted when the count request is received, the count value of the hang counter is reset when the survival notification is received, and the application program is initialized when the count value of the hang counter exceeds a threshold value Instruct,
A program hang detection method characterized by the above.