CN102609350A - Server memory failure alarm method - Google Patents
Server memory failure alarm method Download PDFInfo
- Publication number
- CN102609350A CN102609350A CN2012100332686A CN201210033268A CN102609350A CN 102609350 A CN102609350 A CN 102609350A CN 2012100332686 A CN2012100332686 A CN 2012100332686A CN 201210033268 A CN201210033268 A CN 201210033268A CN 102609350 A CN102609350 A CN 102609350A
- Authority
- CN
- China
- Prior art keywords
- memory
- error
- failure
- error message
- alarm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention provides a server memory failure alarm method. A software program is used for recognizing memory error messages at a startup stage of a server system, the messages are transmitted to a management chip embedded into a main board and are judged in a classified manner, and alarm is given according to levels. The system comprises a failure information recognition unit, an error message database and an alarm unit, wherein the failure information recognition unit is used for acquiring the error messages sent out by a memory component in the system to provide evidence for judging memory failure, the error message database is used for collecting and transmitting the memory error messages, and the alarm unit is used for selecting different failure alarm modes according to different error messages. By implementing the memory failure alarm method in the server system, reliability of the system can be enhanced to a large extent, maintenance is facilitated, and the total image of the system is improved.
Description
Technical background
In the current server system, prior art only triggers the hardware fault circuit signal through memory part, and the LED that carries through plate carries out indicating fault, and is not enough below this type of design exists:
1, failure message can't record, in case system cut-off, failure message that this start is found will be eliminated;
2, the failure mode that can discern of system is limited: only support the detectable easy bugs information of memory part self, for example the internal memory temperature is too high, Error IO writes down excessive number.But for the error message that for example memory chip generation fault, this type of internal memory setup error memory part self can't detect or report, server system can't produce warning message;
3, can't report to the police according to fault order of severity branch rank.
Summary of the invention
Through software program in server system unloading phase identification EMS memory error information; Information is passed to managing chip on the embedded mainboard classifies and judges and report to the police by rank; System comprises: failure message recognition unit (1), error information data storehouse (2), alarm unit (3), wherein:
Failure message recognition unit (1) is responsible for through obtaining memory part sends in the system error message as the foundation of judging memory failure;
The EMS memory error information of transmitting is responsible for collecting in error information data storehouse (2);
Alarm unit (3) is responsible for judging the different fault alarm mode of selecting according to different error messages;
Alarm flow is following:
System powers on, and whether has historical mistake in the faults information bank, and whether detection failure still exist, wherein:
1) fault exists, and is categorized as different faults to error message, reports to the police in a different manner according to the fault rank according to different faults;
2) if fault does not exist, detect this and start shooting whether internal memory is sent out error message, a) internal memory is sent out error message, and error message is recorded the error information data storehouse, is categorized as different faults to error message, reports to the police according to different faults; B) do not exist internal memory to send out error message, remove the historical data in the error information data storehouse.
Excellent effect of the present invention is: alarm unit is included in the watchdog routine among the BMC with the software process form; Can carry out the fault alarm classification according to the misdata of error information data storehouse record; Through Debug digital lamp, LED lamp or hummer, according to the warning of classifying of the wrong order of severity of different stage.
In server system, implement this type of memory failure alarm method, can improve the reliability function of system to a great extent, maintain easily, promote the overall image of product.
Description of drawings
Fig. 1 is alarm flow figure of the present invention.
Embodiment
With reference to accompanying drawing alarm method of the present invention is done following detailed explanation.
Method of the present invention is that the failure message recognition unit is included among the BIOS with the software process form, when system start-up, whether has EMS memory error information during the historical error message of inquiry error information database and this start.
Software detection through the failure message recognition unit; Not only can identify: (1) memory part self sends the hardware fault circuit signal, can also detect (2) memory chip and produce the error message that fault, this type of memory part of internal memory setup error can't trigger self; Failure message recognition unit (1) can obtain whether to have EMS memory error information in the error information data storehouse (2) or in the current start-up course through monitor channel.
The error information data storehouse is recorded among the Flash in the managing chip (BMC) on the embedded mainboard, and when system ran into outage, error message still can be kept among the Flash can not lose.In start next time, detect the memory failure of finding last time for system.
Claims (1)
1. server memory fault alarm method; It is characterized in that; Through software program in server system unloading phase identification EMS memory error information; Information is passed to managing chip on the embedded mainboard classifies and judges and report to the police by rank that system comprises: failure message recognition unit, error information data storehouse, alarm unit, wherein:
The failure message recognition unit is responsible for through obtaining memory part sends in the system error message as the foundation of judging memory failure;
The EMS memory error information of transmitting is responsible for collecting in the error information data storehouse;
Alarm unit is responsible for judging the different fault alarm mode of selecting according to different error messages;
Alert step is following:
System powers on, and whether has historical mistake in the faults information bank, and whether detection failure still exist, wherein:
1) fault exists, and is categorized as different faults to error message, reports to the police in a different manner according to the fault rank according to different faults;
2) fault does not exist, and detects this and starts shooting whether internal memory is sent out error message, and comprising: a) internal memory is sent out error message, and error message is recorded the error information data storehouse, is categorized as different faults to error message, reports to the police according to different faults; B) do not exist internal memory to send out error message, remove the historical data in the error information data storehouse.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012100332686A CN102609350A (en) | 2012-02-15 | 2012-02-15 | Server memory failure alarm method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012100332686A CN102609350A (en) | 2012-02-15 | 2012-02-15 | Server memory failure alarm method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102609350A true CN102609350A (en) | 2012-07-25 |
Family
ID=46526740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012100332686A Pending CN102609350A (en) | 2012-02-15 | 2012-02-15 | Server memory failure alarm method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102609350A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103019898A (en) * | 2012-11-26 | 2013-04-03 | 加弘科技咨询(上海)有限公司 | Error reporting system for memory module detection and slot position traffic light positioning |
CN103077103A (en) * | 2013-01-18 | 2013-05-01 | 浪潮电子信息产业股份有限公司 | Off-line diagnosing method for server faults |
CN103500133A (en) * | 2013-09-17 | 2014-01-08 | 华为技术有限公司 | Fault locating method and device |
CN103995768A (en) * | 2014-06-10 | 2014-08-20 | 浪潮电子信息产业股份有限公司 | Visual quick diagnosing method of server faults |
CN104021054A (en) * | 2014-06-11 | 2014-09-03 | 浪潮(北京)电子信息产业有限公司 | Server fault visual detecting and processing method and system and programmable chip |
CN105159813A (en) * | 2015-08-05 | 2015-12-16 | 北京百度网讯科技有限公司 | Data center based fault alarming method, apparatus, management device and system |
CN108959025A (en) * | 2018-06-27 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of server alarm method, device and server |
CN110780646A (en) * | 2019-09-21 | 2020-02-11 | 苏州浪潮智能科技有限公司 | Memory quality early warning method based on MES system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1570859A (en) * | 2003-07-16 | 2005-01-26 | 联想(北京)有限公司 | Design method for avoiding misuse of non-ECC memory |
CN1845029A (en) * | 2005-11-11 | 2006-10-11 | 南京科远控制工程有限公司 | Setting method for fault diagnosis and accident prediction |
US20070168738A1 (en) * | 2005-12-12 | 2007-07-19 | Inventec Corporation | Power-on error detection system and method |
CN101539881A (en) * | 2008-03-18 | 2009-09-23 | 环达电脑(上海)有限公司 | Device and method for detecting memory errors |
US20100017660A1 (en) * | 2008-07-15 | 2010-01-21 | Caterpillar Inc. | System and method for protecting memory stacks using a debug unit |
CN101741600A (en) * | 2008-11-27 | 2010-06-16 | 英业达股份有限公司 | Server system, recording equipment and management method thereof |
CN101833492A (en) * | 2010-04-15 | 2010-09-15 | 浪潮电子信息产业股份有限公司 | Method for detecting memory failure |
CN101908984A (en) * | 2010-06-30 | 2010-12-08 | 杭州华三通信技术有限公司 | Method and single board for detecting faults of memory |
CN102222025A (en) * | 2011-06-17 | 2011-10-19 | 华为数字技术有限公司 | Method and device for eliminating memory failure |
-
2012
- 2012-02-15 CN CN2012100332686A patent/CN102609350A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1570859A (en) * | 2003-07-16 | 2005-01-26 | 联想(北京)有限公司 | Design method for avoiding misuse of non-ECC memory |
CN1845029A (en) * | 2005-11-11 | 2006-10-11 | 南京科远控制工程有限公司 | Setting method for fault diagnosis and accident prediction |
US20070168738A1 (en) * | 2005-12-12 | 2007-07-19 | Inventec Corporation | Power-on error detection system and method |
CN101539881A (en) * | 2008-03-18 | 2009-09-23 | 环达电脑(上海)有限公司 | Device and method for detecting memory errors |
US20100017660A1 (en) * | 2008-07-15 | 2010-01-21 | Caterpillar Inc. | System and method for protecting memory stacks using a debug unit |
CN101741600A (en) * | 2008-11-27 | 2010-06-16 | 英业达股份有限公司 | Server system, recording equipment and management method thereof |
CN101833492A (en) * | 2010-04-15 | 2010-09-15 | 浪潮电子信息产业股份有限公司 | Method for detecting memory failure |
CN101908984A (en) * | 2010-06-30 | 2010-12-08 | 杭州华三通信技术有限公司 | Method and single board for detecting faults of memory |
CN102222025A (en) * | 2011-06-17 | 2011-10-19 | 华为数字技术有限公司 | Method and device for eliminating memory failure |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103019898A (en) * | 2012-11-26 | 2013-04-03 | 加弘科技咨询(上海)有限公司 | Error reporting system for memory module detection and slot position traffic light positioning |
CN103019898B (en) * | 2012-11-26 | 2017-02-08 | 加弘科技咨询(上海)有限公司 | Error reporting system for memory module detection and slot position traffic light positioning |
CN103077103A (en) * | 2013-01-18 | 2013-05-01 | 浪潮电子信息产业股份有限公司 | Off-line diagnosing method for server faults |
CN103500133A (en) * | 2013-09-17 | 2014-01-08 | 华为技术有限公司 | Fault locating method and device |
CN103995768A (en) * | 2014-06-10 | 2014-08-20 | 浪潮电子信息产业股份有限公司 | Visual quick diagnosing method of server faults |
CN104021054A (en) * | 2014-06-11 | 2014-09-03 | 浪潮(北京)电子信息产业有限公司 | Server fault visual detecting and processing method and system and programmable chip |
CN105159813A (en) * | 2015-08-05 | 2015-12-16 | 北京百度网讯科技有限公司 | Data center based fault alarming method, apparatus, management device and system |
CN105159813B (en) * | 2015-08-05 | 2018-09-14 | 北京百度网讯科技有限公司 | Fault alarm method, device, management equipment based on data center and system |
CN108959025A (en) * | 2018-06-27 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of server alarm method, device and server |
CN110780646A (en) * | 2019-09-21 | 2020-02-11 | 苏州浪潮智能科技有限公司 | Memory quality early warning method based on MES system |
CN110780646B (en) * | 2019-09-21 | 2021-11-26 | 苏州浪潮智能科技有限公司 | Memory quality early warning method based on MES system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102609350A (en) | Server memory failure alarm method | |
CN109783262B (en) | Fault data processing method, device, server and computer readable storage medium | |
CN104268061B (en) | A kind of storage state monitoring method suitable for virtual machine | |
CN106789306B (en) | Method and system for detecting, collecting and recovering software fault of communication equipment | |
CN105045689A (en) | Method for monitoring and alarming hard disks by using RAID card batch detection | |
CN104796273A (en) | Method and device for diagnosing root of network faults | |
CN109491819A (en) | A kind of method and system of diagnosis server failure | |
CN112395156A (en) | Fault warning method and device, storage medium and electronic equipment | |
CN111459782A (en) | Method and device for monitoring business system, cloud platform system and server | |
CN103401698A (en) | Monitoring system used for alarming server status in server cluster operation | |
CN105607973B (en) | Method, device and system for processing equipment fault in virtual machine system | |
CN102404141A (en) | Method and device of alarm inhibition | |
JP2015109069A (en) | Fault symptom notification apparatus, symptom notification method and symptom notification program | |
CN110763952A (en) | Underground cable fault monitoring method and device | |
WO2024148857A1 (en) | Method and apparatus for filtering root cause of server fault, and non-volatile readable storage medium and electronic apparatus | |
CN110784352B (en) | Data synchronous monitoring and alarming method and device based on Oracle golden gate | |
CN105300447A (en) | System and method for monitoring operation state of equipment | |
CN103605592A (en) | Mechanism of detecting malfunctions of distributed computer system | |
CN114327968A (en) | Method and device for realizing early warning of server hardware fault telephone with universal interface | |
CN112306871A (en) | Data processing method, device, equipment and storage medium | |
JP2006268515A (en) | Pci card trouble management system | |
CN103761157A (en) | Method for implementing system fault-tolerant mechanism on basis of multitask patrol strategy | |
JP5803246B2 (en) | Network operation management system, network monitoring server, network monitoring method and program | |
CN105955864A (en) | Power supply fault processing method, power supply module, monitoring management module and server | |
CN103514086A (en) | Extraction method and device for software error report |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120725 |