CN107346269A - The method and system of controller failure protection are managed in a kind of server - Google Patents
The method and system of controller failure protection are managed in a kind of server Download PDFInfo
- Publication number
- CN107346269A CN107346269A CN201710517705.4A CN201710517705A CN107346269A CN 107346269 A CN107346269 A CN 107346269A CN 201710517705 A CN201710517705 A CN 201710517705A CN 107346269 A CN107346269 A CN 107346269A
- Authority
- CN
- China
- Prior art keywords
- management controller
- protecting device
- failure
- sent
- timing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
- G06F11/0757—Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Programmable Controllers (AREA)
Abstract
The method and system of controller failure protection are managed in a kind of server, the Management Controller of server are debugged into signal output part first, reset signal input is connected with Management Controller respectively;Then the data that failure protecting device is sent according to Management Controller are judged, when what is sent is the initialization information of Management Controller, are not then processed;When what is sent is data specified that Management Controller interval is sent, then failure protecting device starts timing, and when receiving the data specified described in identical next time, the value of failure protecting device timing is reset;When the value of timing reaches preset value, failure protecting device does not receive the data specified of Management Controller transmission, then sends reset signal and reset Management Controller.When can shorten Management Controller failure by the method and system of the present invention, it is carried out to reset the time required for recovering;The time of increased Management Controller normal work, lift the reliability of server.
Description
Technical field
The present invention relates to a kind of server protection system, belongs in field of computer technology, more particularly to a kind of server
The method and system of Management Controller error protection
Background technology
During programmable controller work in server, it may occur that dysfunction, preset program etc. can not be performed ask
Topic.Such case is run into, it is necessary to be resetted to programmable controller.The processing mode of prior art is as follows:It can compile
The reset signal of range controller is connected in house dog watchdog circuits or equipment, and programmable controller is constantly to watchdog
Circuit sends pulse signal, and when programmable controller breaks down, program can not perform pulse signal and can not send.
Watchdog circuits constantly carry out timing, and the value of timing is reset after pulse signal is received.When timing time reaches preset value
When, because pulse signal does not issue, timer causes timing time to reach predetermined value, watchdog circuits without clearing in time
Reset signal is sent to programmable controller, ensures programmable controller normal work.
In server design, Management Controller is often used to the fan of server, voltage, power consumption, error message
Etc. being monitored and control.Management Controller is also one kind of above-mentioned programmable controller, to ensure server reliability, is also needed
Management Controller is resetted when Management Controller breaks down.
Management Controller needs to be initialized when electrifying startup, and it, which is initialized, needs a period of time, typically 1
More than minute.If Management Controller is monitored and resetted using above-mentioned prior art, its timing time must exceed
It the time that Management Controller initialization needs, otherwise can cause Management Controller in electrifying startup, not complete initialization, just exist
Resetted under the control of watchdog circuits, thus the phenomenon that can not start.And timing time it is long the shortcomings that be, managing
Recover normal work, it is necessary to wait and be lot more time to reset when controller breaks down.This can cause Management Controller to exist
In some time can not monitoring server important information, be unfavorable for the reliability of server.
The content of the invention
The present invention provides the method and system that controller failure protection is managed in a kind of server, to solve prior art
Middle Management Controller watchdog timing time is long, it is necessary to the long period could reset recovery normal work, controls management
Device processed occur can not monitoring server potential safety hazard.
The present invention is achieved by the following technical programs:
A kind of method that controller failure protection is managed in server, comprises the following steps:
S1., the signal that the Management Controller debugging rs 232 serial interface signal output of server is terminated to failure protecting device inputs
End, the reset signal input of the reset signal output end connection management controller of failure protecting device, to be protected by failure
Protection unit control Management Controller resets.
S2. when Management Controller, which breaks down, to be initialized, Management Controller is by debugging rs 232 serial interface signal output end
Current init state information is continuously sent to failure protecting device, for indicating that Management Controller initializes degree;Work as pipe
When managing controller initialization normal work, Management Controller can be spaced by debugging rs 232 serial interface signal output end to failure protecting device
Send the data specified.
S3. failure protecting device is debugged the data sent of rs 232 serial interface signal output end according to Management Controller and judged, when
When what debugging rs 232 serial interface signal output end was sent is the initialization information of Management Controller, then do not process;When debugging rs 232 serial interface signal
What output end was sent is the data specified that Management Controller interval is sent, then failure protecting device starts timing, and in next time
When receiving the data specified described in identical, the value of failure protecting device timing is reset;When the value of timing reaches preset value,
Failure protecting device does not receive the data specified that Management Controller debugging rs 232 serial interface signal output end is sent, then sends and reset letter
Number Management Controller is resetted.
The method that controller failure protection is managed in a kind of server as described above, the step S3 failure protecting devices
The preset value of timing is 20~40 seconds.
The method that controller failure protection is managed in a kind of server as described above, the failure protecting device are complexity
Programmable logic device (CPLD) or on-site programmable gate array FPGA.
The method that controller failure protection is managed in a kind of server as described above, the failure protecting device are
Watchdog circuits.
The system that controller failure protection is managed in a kind of server, including server, the management control of the server
Device debugging rs 232 serial interface signal output terminates to the signal input part of failure protecting device, the reset signal output end of failure protecting device
The reset signal input of connection management controller, the failure protecting device is provided with comparison module and timing module, described
Comparison module is used to judge the data that Management Controller debugging rs 232 serial interface signal output end is sent, when debugging rs 232 serial interface signal is defeated
Go out end send be the initialization information of Management Controller when, then do not process;When what debugging rs 232 serial interface signal output end was sent is
The data specified that Management Controller interval is sent, then the timing module is started into timing, and described specify is received in next time
Data when, the value of timing module timing is reset;When the value of timing reaches preset value, failure protecting device does not receive pipe
The data specified that controller debugging rs 232 serial interface signal output end is sent are managed, then sends reset signal and resets Management Controller.
The system that controller failure protection is managed in a kind of server as described above, the failure protecting device are complexity
Programmable logic device (CPLD) or on-site programmable gate array FPGA.
The system that controller failure protection is managed in a kind of server as described above, the timing module timing are preset
It is worth for 20~40 seconds.
Compared with prior art, it is an advantage of the invention that:
The shortcomings that timing time of the invention for Management Controller watchdog in the prior art is long, is controlled using management
Device processed in initialization procedure with, to the Serial Port Information that CPLD/FPGA transmissions are different, CPLD/FPGA is to this in course of normal operation
Information is judged, and decides whether to reset according to judged result.It can shorten management control by the method and system of the present invention
When device processed breaks down, it is carried out to reset the time required for recovering;The time of increased Management Controller normal work, carry
Rise the reliability of server.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described.
Fig. 1 is the electrical schematic diagram of present system.
Fig. 2 is the flow chart of the inventive method.
Reference:1- Management Controllers, 2- serial port data lines, 3- reseting data lines, 4- failure protecting devices, 41- ratios
Compared with module, 42- timing modules.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments.
As shown in figure 1, the system of controller failure protection is managed in a kind of server of the present embodiment, including server, clothes
The Management Controller 1 of business device debugs the signal input part company that signal output part passes through serial port data line 2 and failure protecting device 4
Connect, the reset signal output end of failure protecting device 4 is connected by the reset signal input of reseting data line 3 and Management Controller
Connect, so as to be resetted by failure protecting device 4 to Management Controller 1.
Management Controller 4 is provided with comparison module 41 and timing module 42, and comparison module 41 is used to adjust Management Controller 1
The data sent of examination serial ports are judged, when debug that serial ports sends is the initialization information of Management Controller 1, is not then done and are located
Reason;What is sent when debugging serial ports is to be spaced the specified data of transmission after the completion of Management Controller 1 initializes, then by the timing mould
Block 42 starts timing, and when receiving the specified data next time, the value of the timing of timing module 42 is reset;When the value of timing reaches
During to preset value, failure protecting device 4 does not receive the specified data of Management Controller transmission, then will be managed by reset signal
Controller 1 resets.Further, the preset value of the timing of timing module 42 is 30 seconds.
Present invention also offers a kind of method that controller failure protection is managed in server, comprise the following steps:
The Management Controller 1 of server is debugged into the signal input part that signal output terminates to failure protecting device 4 first,
The reset signal input of the reset signal output end connection management controller 1 of failure protecting device 4, to pass through error protection
Device 4 controls Management Controller 1 to reset.
When Management Controller 1, which breaks down, to be initialized, Management Controller 1 is continuously protected by debugging serial ports to failure
Protection unit 4 sends current init state information, for indicating the initialization degree of Management Controller 1;At the beginning of Management Controller 1
During beginningization normal work, Management Controller 1 can be spaced sends the data specified by debugging serial ports to failure protecting device 4.
Then the data that failure protecting device 4 is sent by the debugging serial ports of Management Controller 1 are judged, when debugging serial ports
When what is sent is the initialization information of Management Controller 1, then do not process;When debugging serial ports send be Management Controller 1 at the beginning of
The specified data of transmission are spaced after the completion of beginningization, then failure protecting device 4 starts timing, and finger described in identical is received in next time
Fixed number according to when, the value of the timing of failure protecting device 4 is reset;When the value of timing reaches preset value, failure protecting device 4 does not connect
The specified data of Management Controller transmission are received, then are resetted Management Controller 1 by reset signal.
Wherein failure protecting device 4 is complex programmable logic device (CPLD) or on-site programmable gate array FPGA, CPLD/
Comparison module and timing module are provided with FPGA.
Specifically, as shown in Fig. 2 the present embodiment is constantly sent when Management Controller 1 initializes by debugging serial ports
Current init state information, indicates which step current Management Controller 1 has been initialised to;Management Controller completes initialization
And during normal work, just send the data specified to CPLD/FPGA by debugging serial ports at predetermined time intervals.
CPLD/FPGA is judged that what such as debugging serial ports was sent is management control according to the data sent of debugging serial ports
The initialization information of device 1, then do not process;What if debugging serial ports was sent is after the completion of Management Controller 1 initializes, periodically hair
The specified data sent, then CPLD/FPGA starts timing, and after the specified data are received again by, resets the value of timing.Work as meter
When value when reaching preset value, pass through reseting signal reset Management Controller 1.
When can shorten Management Controller failure by the method and system of the present invention, it is carried out to reset and recovers institute
The time needed;Increase the time of the normal work of Management Controller 1, lift the reliability of server.
The technology contents of the not detailed description of the present invention are known technology.
Claims (7)
1. the method for controller failure protection is managed in a kind of server, it is characterised in that comprise the following steps:
S1. the Management Controller debugging rs 232 serial interface signal output of server is terminated into the signal input part of failure protecting device, therefore
Hinder the reset signal input of the reset signal output end connection management controller of protection device, to pass through failure protecting device
Management Controller is controlled to reset;
S2. when Management Controller, which breaks down, to be initialized, Management Controller is continuous by debugging rs 232 serial interface signal output end
Current init state information is sent to failure protecting device, for indicating that Management Controller initializes degree;When management is controlled
During device initialization normal work processed, Management Controller can be spaced to be sent by debugging rs 232 serial interface signal output end to failure protecting device
The data specified;
S3. the data that failure protecting device is sent according to Management Controller debugging rs 232 serial interface signal output end are judged, work as debugging
When what rs 232 serial interface signal output end was sent is the initialization information of Management Controller, then do not process;When debugging rs 232 serial interface signal output
What end was sent is the data specified that Management Controller interval is sent, then failure protecting device starts timing, and is received in next time
During the data specified described in identical, the value of failure protecting device timing is reset;When the value of timing reaches preset value, failure
Protection device does not receive the data specified that Management Controller debugging rs 232 serial interface signal output end is sent, then sending reset signal will
Management Controller resets.
2. the method for controller failure protection is managed in a kind of server according to claim 1, it is characterised in that described
The preset value of step S3 failure protecting device timing is 20~40 seconds.
3. the method for controller failure protection is managed in a kind of server according to claim 1, it is characterised in that described
Failure protecting device is complex programmable logic device (CPLD) or on-site programmable gate array FPGA.
4. the method for controller failure protection is managed in a kind of server according to claim 1, it is characterised in that described
Failure protecting device is watchdog circuits.
5. the system of controller failure protection is managed in a kind of server, including server, it is characterised in that the server
Management Controller debugging rs 232 serial interface signal output terminates to the signal input part of failure protecting device, the reset letter of failure protecting device
The reset signal input of number output end connection management controller, the failure protecting device are provided with comparison module and timing mould
Block, the comparison module is used to judge the data that Management Controller debugging rs 232 serial interface signal output end is sent, when debugging is gone here and there
When what mouth signal output part was sent is the initialization information of Management Controller, then do not process;When debugging rs 232 serial interface signal output end
What is sent is the data specified that Management Controller interval is sent, then the timing module is started into timing, and receive in next time
It is described specify data when, the value of timing module timing is reset;When the value of timing reaches preset value, failure protecting device is not
The data specified that Management Controller debugging rs 232 serial interface signal output end is sent are received, then send reset signal by Management Controller
Reset.
6. the system of controller failure protection is managed in a kind of server according to claim 5, it is characterised in that described
Failure protecting device is complex programmable logic device (CPLD) or on-site programmable gate array FPGA.
7. the system of controller failure protection is managed in a kind of server according to claim 5, it is characterised in that described
The preset value of timing module timing is 20~40 seconds.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710517705.4A CN107346269A (en) | 2017-06-29 | 2017-06-29 | The method and system of controller failure protection are managed in a kind of server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710517705.4A CN107346269A (en) | 2017-06-29 | 2017-06-29 | The method and system of controller failure protection are managed in a kind of server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107346269A true CN107346269A (en) | 2017-11-14 |
Family
ID=60257204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710517705.4A Pending CN107346269A (en) | 2017-06-29 | 2017-06-29 | The method and system of controller failure protection are managed in a kind of server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107346269A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022007414A1 (en) * | 2020-07-10 | 2022-01-13 | 苏州浪潮智能科技有限公司 | Server fan control device and method based on control chip |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1506825A (en) * | 2002-12-10 | 2004-06-23 | 深圳市中兴通讯股份有限公司 | Real-time adjustable reset method and device for watch dog |
US7137036B2 (en) * | 2002-02-22 | 2006-11-14 | Oki Electric Industry Co., Ltd. | Microcontroller having an error detector detecting errors in itself as well |
CN103713916A (en) * | 2012-10-09 | 2014-04-09 | 华平信息技术股份有限公司 | Automatic application program running method and automatic application program running system in Windows embedded system |
CN104049702A (en) * | 2014-06-16 | 2014-09-17 | 京信通信***(中国)有限公司 | Single chip microcomputer-based CPU (Central Processing Unit) reset control system, method and device |
-
2017
- 2017-06-29 CN CN201710517705.4A patent/CN107346269A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7137036B2 (en) * | 2002-02-22 | 2006-11-14 | Oki Electric Industry Co., Ltd. | Microcontroller having an error detector detecting errors in itself as well |
CN1506825A (en) * | 2002-12-10 | 2004-06-23 | 深圳市中兴通讯股份有限公司 | Real-time adjustable reset method and device for watch dog |
CN103713916A (en) * | 2012-10-09 | 2014-04-09 | 华平信息技术股份有限公司 | Automatic application program running method and automatic application program running system in Windows embedded system |
CN104049702A (en) * | 2014-06-16 | 2014-09-17 | 京信通信***(中国)有限公司 | Single chip microcomputer-based CPU (Central Processing Unit) reset control system, method and device |
Non-Patent Citations (1)
Title |
---|
李观文、衣平、邓英华: "《看门狗技术在改善***可靠性中的应用》", 《机床电器》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022007414A1 (en) * | 2020-07-10 | 2022-01-13 | 苏州浪潮智能科技有限公司 | Server fan control device and method based on control chip |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102508755B (en) | Device and method for simulating interface card hot-plugging | |
CN106610712B (en) | Substrate management controller resetting system and method | |
CN105388982B (en) | Multiprocessor electrification reset circuit | |
CN104734904B (en) | The automatic test approach and system of bypass equipment | |
CN104135398A (en) | Intelligent RS485 concentrator and bus deadlock detection method | |
CN102831084A (en) | Controller and controlling method for re-identifying USB (universal serial bus) equipment | |
CN102955136A (en) | Assistant detection circuit and assistant detection method for redundant power sources | |
CN103645730A (en) | Motion control card with self-checking function and detection method | |
CN100371901C (en) | Fault filling method and apparatus based on programmable logical device | |
CN111366316A (en) | System and method for detecting liquid in server and server | |
CN112099412A (en) | Safety redundancy architecture of micro control unit | |
CN102780207B (en) | voltage protection system and method | |
CN103777617B (en) | Upper and lower computer communication monitoring method | |
CN107346269A (en) | The method and system of controller failure protection are managed in a kind of server | |
CN104572331B (en) | The monitoring module enabled with power monitoring and delayed | |
CN101650702B (en) | On-line USB communication maintenance device and method | |
CN104133759A (en) | Method and device for realizing extension module removal | |
CN109726055B (en) | Method for detecting PCIe chip abnormity and computer equipment | |
CN102074274A (en) | Method for detecting errors of and automatically resetting encryption chip in encryption card | |
JP2012068907A (en) | Bus connection circuit and bus connection method | |
CN106919493A (en) | Electric fault monitoring system and method on a kind of server | |
CN202758347U (en) | Controller of re-identifying universal serial bus (USB) device | |
CN102810840B (en) | Voltage protection system | |
CN107918069A (en) | System and method are tested in a kind of power down | |
CN107179911A (en) | A kind of method and apparatus for restarting management engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171114 |
|
RJ01 | Rejection of invention patent application after publication |