CN107346269A - The method and system of controller failure protection are managed in a kind of server - Google Patents

The method and system of controller failure protection are managed in a kind of server Download PDF

Info

Publication number
CN107346269A
CN107346269A CN201710517705.4A CN201710517705A CN107346269A CN 107346269 A CN107346269 A CN 107346269A CN 201710517705 A CN201710517705 A CN 201710517705A CN 107346269 A CN107346269 A CN 107346269A
Authority
CN
China
Prior art keywords
management controller
protecting device
failure
sent
timing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710517705.4A
Other languages
Chinese (zh)
Inventor
程万前
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710517705.4A priority Critical patent/CN107346269A/en
Publication of CN107346269A publication Critical patent/CN107346269A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Programmable Controllers (AREA)

Abstract

The method and system of controller failure protection are managed in a kind of server, the Management Controller of server are debugged into signal output part first, reset signal input is connected with Management Controller respectively;Then the data that failure protecting device is sent according to Management Controller are judged, when what is sent is the initialization information of Management Controller, are not then processed;When what is sent is data specified that Management Controller interval is sent, then failure protecting device starts timing, and when receiving the data specified described in identical next time, the value of failure protecting device timing is reset;When the value of timing reaches preset value, failure protecting device does not receive the data specified of Management Controller transmission, then sends reset signal and reset Management Controller.When can shorten Management Controller failure by the method and system of the present invention, it is carried out to reset the time required for recovering;The time of increased Management Controller normal work, lift the reliability of server.

Description

The method and system of controller failure protection are managed in a kind of server
Technical field
The present invention relates to a kind of server protection system, belongs in field of computer technology, more particularly to a kind of server The method and system of Management Controller error protection
Background technology
During programmable controller work in server, it may occur that dysfunction, preset program etc. can not be performed ask Topic.Such case is run into, it is necessary to be resetted to programmable controller.The processing mode of prior art is as follows:It can compile The reset signal of range controller is connected in house dog watchdog circuits or equipment, and programmable controller is constantly to watchdog Circuit sends pulse signal, and when programmable controller breaks down, program can not perform pulse signal and can not send. Watchdog circuits constantly carry out timing, and the value of timing is reset after pulse signal is received.When timing time reaches preset value When, because pulse signal does not issue, timer causes timing time to reach predetermined value, watchdog circuits without clearing in time Reset signal is sent to programmable controller, ensures programmable controller normal work.
In server design, Management Controller is often used to the fan of server, voltage, power consumption, error message Etc. being monitored and control.Management Controller is also one kind of above-mentioned programmable controller, to ensure server reliability, is also needed Management Controller is resetted when Management Controller breaks down.
Management Controller needs to be initialized when electrifying startup, and it, which is initialized, needs a period of time, typically 1 More than minute.If Management Controller is monitored and resetted using above-mentioned prior art, its timing time must exceed It the time that Management Controller initialization needs, otherwise can cause Management Controller in electrifying startup, not complete initialization, just exist Resetted under the control of watchdog circuits, thus the phenomenon that can not start.And timing time it is long the shortcomings that be, managing Recover normal work, it is necessary to wait and be lot more time to reset when controller breaks down.This can cause Management Controller to exist In some time can not monitoring server important information, be unfavorable for the reliability of server.
The content of the invention
The present invention provides the method and system that controller failure protection is managed in a kind of server, to solve prior art Middle Management Controller watchdog timing time is long, it is necessary to the long period could reset recovery normal work, controls management Device processed occur can not monitoring server potential safety hazard.
The present invention is achieved by the following technical programs:
A kind of method that controller failure protection is managed in server, comprises the following steps:
S1., the signal that the Management Controller debugging rs 232 serial interface signal output of server is terminated to failure protecting device inputs End, the reset signal input of the reset signal output end connection management controller of failure protecting device, to be protected by failure Protection unit control Management Controller resets.
S2. when Management Controller, which breaks down, to be initialized, Management Controller is by debugging rs 232 serial interface signal output end Current init state information is continuously sent to failure protecting device, for indicating that Management Controller initializes degree;Work as pipe When managing controller initialization normal work, Management Controller can be spaced by debugging rs 232 serial interface signal output end to failure protecting device Send the data specified.
S3. failure protecting device is debugged the data sent of rs 232 serial interface signal output end according to Management Controller and judged, when When what debugging rs 232 serial interface signal output end was sent is the initialization information of Management Controller, then do not process;When debugging rs 232 serial interface signal What output end was sent is the data specified that Management Controller interval is sent, then failure protecting device starts timing, and in next time When receiving the data specified described in identical, the value of failure protecting device timing is reset;When the value of timing reaches preset value, Failure protecting device does not receive the data specified that Management Controller debugging rs 232 serial interface signal output end is sent, then sends and reset letter Number Management Controller is resetted.
The method that controller failure protection is managed in a kind of server as described above, the step S3 failure protecting devices The preset value of timing is 20~40 seconds.
The method that controller failure protection is managed in a kind of server as described above, the failure protecting device are complexity Programmable logic device (CPLD) or on-site programmable gate array FPGA.
The method that controller failure protection is managed in a kind of server as described above, the failure protecting device are Watchdog circuits.
The system that controller failure protection is managed in a kind of server, including server, the management control of the server Device debugging rs 232 serial interface signal output terminates to the signal input part of failure protecting device, the reset signal output end of failure protecting device The reset signal input of connection management controller, the failure protecting device is provided with comparison module and timing module, described Comparison module is used to judge the data that Management Controller debugging rs 232 serial interface signal output end is sent, when debugging rs 232 serial interface signal is defeated Go out end send be the initialization information of Management Controller when, then do not process;When what debugging rs 232 serial interface signal output end was sent is The data specified that Management Controller interval is sent, then the timing module is started into timing, and described specify is received in next time Data when, the value of timing module timing is reset;When the value of timing reaches preset value, failure protecting device does not receive pipe The data specified that controller debugging rs 232 serial interface signal output end is sent are managed, then sends reset signal and resets Management Controller.
The system that controller failure protection is managed in a kind of server as described above, the failure protecting device are complexity Programmable logic device (CPLD) or on-site programmable gate array FPGA.
The system that controller failure protection is managed in a kind of server as described above, the timing module timing are preset It is worth for 20~40 seconds.
Compared with prior art, it is an advantage of the invention that:
The shortcomings that timing time of the invention for Management Controller watchdog in the prior art is long, is controlled using management Device processed in initialization procedure with, to the Serial Port Information that CPLD/FPGA transmissions are different, CPLD/FPGA is to this in course of normal operation Information is judged, and decides whether to reset according to judged result.It can shorten management control by the method and system of the present invention When device processed breaks down, it is carried out to reset the time required for recovering;The time of increased Management Controller normal work, carry Rise the reliability of server.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described.
Fig. 1 is the electrical schematic diagram of present system.
Fig. 2 is the flow chart of the inventive method.
Reference:1- Management Controllers, 2- serial port data lines, 3- reseting data lines, 4- failure protecting devices, 41- ratios Compared with module, 42- timing modules.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.
As shown in figure 1, the system of controller failure protection is managed in a kind of server of the present embodiment, including server, clothes The Management Controller 1 of business device debugs the signal input part company that signal output part passes through serial port data line 2 and failure protecting device 4 Connect, the reset signal output end of failure protecting device 4 is connected by the reset signal input of reseting data line 3 and Management Controller Connect, so as to be resetted by failure protecting device 4 to Management Controller 1.
Management Controller 4 is provided with comparison module 41 and timing module 42, and comparison module 41 is used to adjust Management Controller 1 The data sent of examination serial ports are judged, when debug that serial ports sends is the initialization information of Management Controller 1, is not then done and are located Reason;What is sent when debugging serial ports is to be spaced the specified data of transmission after the completion of Management Controller 1 initializes, then by the timing mould Block 42 starts timing, and when receiving the specified data next time, the value of the timing of timing module 42 is reset;When the value of timing reaches During to preset value, failure protecting device 4 does not receive the specified data of Management Controller transmission, then will be managed by reset signal Controller 1 resets.Further, the preset value of the timing of timing module 42 is 30 seconds.
Present invention also offers a kind of method that controller failure protection is managed in server, comprise the following steps:
The Management Controller 1 of server is debugged into the signal input part that signal output terminates to failure protecting device 4 first, The reset signal input of the reset signal output end connection management controller 1 of failure protecting device 4, to pass through error protection Device 4 controls Management Controller 1 to reset.
When Management Controller 1, which breaks down, to be initialized, Management Controller 1 is continuously protected by debugging serial ports to failure Protection unit 4 sends current init state information, for indicating the initialization degree of Management Controller 1;At the beginning of Management Controller 1 During beginningization normal work, Management Controller 1 can be spaced sends the data specified by debugging serial ports to failure protecting device 4.
Then the data that failure protecting device 4 is sent by the debugging serial ports of Management Controller 1 are judged, when debugging serial ports When what is sent is the initialization information of Management Controller 1, then do not process;When debugging serial ports send be Management Controller 1 at the beginning of The specified data of transmission are spaced after the completion of beginningization, then failure protecting device 4 starts timing, and finger described in identical is received in next time Fixed number according to when, the value of the timing of failure protecting device 4 is reset;When the value of timing reaches preset value, failure protecting device 4 does not connect The specified data of Management Controller transmission are received, then are resetted Management Controller 1 by reset signal.
Wherein failure protecting device 4 is complex programmable logic device (CPLD) or on-site programmable gate array FPGA, CPLD/ Comparison module and timing module are provided with FPGA.
Specifically, as shown in Fig. 2 the present embodiment is constantly sent when Management Controller 1 initializes by debugging serial ports Current init state information, indicates which step current Management Controller 1 has been initialised to;Management Controller completes initialization And during normal work, just send the data specified to CPLD/FPGA by debugging serial ports at predetermined time intervals.
CPLD/FPGA is judged that what such as debugging serial ports was sent is management control according to the data sent of debugging serial ports The initialization information of device 1, then do not process;What if debugging serial ports was sent is after the completion of Management Controller 1 initializes, periodically hair The specified data sent, then CPLD/FPGA starts timing, and after the specified data are received again by, resets the value of timing.Work as meter When value when reaching preset value, pass through reseting signal reset Management Controller 1.
When can shorten Management Controller failure by the method and system of the present invention, it is carried out to reset and recovers institute The time needed;Increase the time of the normal work of Management Controller 1, lift the reliability of server.
The technology contents of the not detailed description of the present invention are known technology.

Claims (7)

1. the method for controller failure protection is managed in a kind of server, it is characterised in that comprise the following steps:
S1. the Management Controller debugging rs 232 serial interface signal output of server is terminated into the signal input part of failure protecting device, therefore Hinder the reset signal input of the reset signal output end connection management controller of protection device, to pass through failure protecting device Management Controller is controlled to reset;
S2. when Management Controller, which breaks down, to be initialized, Management Controller is continuous by debugging rs 232 serial interface signal output end Current init state information is sent to failure protecting device, for indicating that Management Controller initializes degree;When management is controlled During device initialization normal work processed, Management Controller can be spaced to be sent by debugging rs 232 serial interface signal output end to failure protecting device The data specified;
S3. the data that failure protecting device is sent according to Management Controller debugging rs 232 serial interface signal output end are judged, work as debugging When what rs 232 serial interface signal output end was sent is the initialization information of Management Controller, then do not process;When debugging rs 232 serial interface signal output What end was sent is the data specified that Management Controller interval is sent, then failure protecting device starts timing, and is received in next time During the data specified described in identical, the value of failure protecting device timing is reset;When the value of timing reaches preset value, failure Protection device does not receive the data specified that Management Controller debugging rs 232 serial interface signal output end is sent, then sending reset signal will Management Controller resets.
2. the method for controller failure protection is managed in a kind of server according to claim 1, it is characterised in that described The preset value of step S3 failure protecting device timing is 20~40 seconds.
3. the method for controller failure protection is managed in a kind of server according to claim 1, it is characterised in that described Failure protecting device is complex programmable logic device (CPLD) or on-site programmable gate array FPGA.
4. the method for controller failure protection is managed in a kind of server according to claim 1, it is characterised in that described Failure protecting device is watchdog circuits.
5. the system of controller failure protection is managed in a kind of server, including server, it is characterised in that the server Management Controller debugging rs 232 serial interface signal output terminates to the signal input part of failure protecting device, the reset letter of failure protecting device The reset signal input of number output end connection management controller, the failure protecting device are provided with comparison module and timing mould Block, the comparison module is used to judge the data that Management Controller debugging rs 232 serial interface signal output end is sent, when debugging is gone here and there When what mouth signal output part was sent is the initialization information of Management Controller, then do not process;When debugging rs 232 serial interface signal output end What is sent is the data specified that Management Controller interval is sent, then the timing module is started into timing, and receive in next time It is described specify data when, the value of timing module timing is reset;When the value of timing reaches preset value, failure protecting device is not The data specified that Management Controller debugging rs 232 serial interface signal output end is sent are received, then send reset signal by Management Controller Reset.
6. the system of controller failure protection is managed in a kind of server according to claim 5, it is characterised in that described Failure protecting device is complex programmable logic device (CPLD) or on-site programmable gate array FPGA.
7. the system of controller failure protection is managed in a kind of server according to claim 5, it is characterised in that described The preset value of timing module timing is 20~40 seconds.
CN201710517705.4A 2017-06-29 2017-06-29 The method and system of controller failure protection are managed in a kind of server Pending CN107346269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710517705.4A CN107346269A (en) 2017-06-29 2017-06-29 The method and system of controller failure protection are managed in a kind of server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710517705.4A CN107346269A (en) 2017-06-29 2017-06-29 The method and system of controller failure protection are managed in a kind of server

Publications (1)

Publication Number Publication Date
CN107346269A true CN107346269A (en) 2017-11-14

Family

ID=60257204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710517705.4A Pending CN107346269A (en) 2017-06-29 2017-06-29 The method and system of controller failure protection are managed in a kind of server

Country Status (1)

Country Link
CN (1) CN107346269A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022007414A1 (en) * 2020-07-10 2022-01-13 苏州浪潮智能科技有限公司 Server fan control device and method based on control chip

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1506825A (en) * 2002-12-10 2004-06-23 深圳市中兴通讯股份有限公司 Real-time adjustable reset method and device for watch dog
US7137036B2 (en) * 2002-02-22 2006-11-14 Oki Electric Industry Co., Ltd. Microcontroller having an error detector detecting errors in itself as well
CN103713916A (en) * 2012-10-09 2014-04-09 华平信息技术股份有限公司 Automatic application program running method and automatic application program running system in Windows embedded system
CN104049702A (en) * 2014-06-16 2014-09-17 京信通信***(中国)有限公司 Single chip microcomputer-based CPU (Central Processing Unit) reset control system, method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7137036B2 (en) * 2002-02-22 2006-11-14 Oki Electric Industry Co., Ltd. Microcontroller having an error detector detecting errors in itself as well
CN1506825A (en) * 2002-12-10 2004-06-23 深圳市中兴通讯股份有限公司 Real-time adjustable reset method and device for watch dog
CN103713916A (en) * 2012-10-09 2014-04-09 华平信息技术股份有限公司 Automatic application program running method and automatic application program running system in Windows embedded system
CN104049702A (en) * 2014-06-16 2014-09-17 京信通信***(中国)有限公司 Single chip microcomputer-based CPU (Central Processing Unit) reset control system, method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李观文、衣平、邓英华: "《看门狗技术在改善***可靠性中的应用》", 《机床电器》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022007414A1 (en) * 2020-07-10 2022-01-13 苏州浪潮智能科技有限公司 Server fan control device and method based on control chip

Similar Documents

Publication Publication Date Title
CN102508755B (en) Device and method for simulating interface card hot-plugging
CN106610712B (en) Substrate management controller resetting system and method
CN105388982B (en) Multiprocessor electrification reset circuit
CN104734904B (en) The automatic test approach and system of bypass equipment
CN104135398A (en) Intelligent RS485 concentrator and bus deadlock detection method
CN102831084A (en) Controller and controlling method for re-identifying USB (universal serial bus) equipment
CN102955136A (en) Assistant detection circuit and assistant detection method for redundant power sources
CN103645730A (en) Motion control card with self-checking function and detection method
CN100371901C (en) Fault filling method and apparatus based on programmable logical device
CN111366316A (en) System and method for detecting liquid in server and server
CN112099412A (en) Safety redundancy architecture of micro control unit
CN102780207B (en) voltage protection system and method
CN103777617B (en) Upper and lower computer communication monitoring method
CN107346269A (en) The method and system of controller failure protection are managed in a kind of server
CN104572331B (en) The monitoring module enabled with power monitoring and delayed
CN101650702B (en) On-line USB communication maintenance device and method
CN104133759A (en) Method and device for realizing extension module removal
CN109726055B (en) Method for detecting PCIe chip abnormity and computer equipment
CN102074274A (en) Method for detecting errors of and automatically resetting encryption chip in encryption card
JP2012068907A (en) Bus connection circuit and bus connection method
CN106919493A (en) Electric fault monitoring system and method on a kind of server
CN202758347U (en) Controller of re-identifying universal serial bus (USB) device
CN102810840B (en) Voltage protection system
CN107918069A (en) System and method are tested in a kind of power down
CN107179911A (en) A kind of method and apparatus for restarting management engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171114

RJ01 Rejection of invention patent application after publication