CN105117300A - Apparatus for realizing high availability of heartbeat - Google Patents

Apparatus for realizing high availability of heartbeat Download PDF

Info

Publication number
CN105117300A
CN105117300A CN201510493574.1A CN201510493574A CN105117300A CN 105117300 A CN105117300 A CN 105117300A CN 201510493574 A CN201510493574 A CN 201510493574A CN 105117300 A CN105117300 A CN 105117300A
Authority
CN
China
Prior art keywords
heartbeat
high availability
watchdog
module
watchdog module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510493574.1A
Other languages
Chinese (zh)
Inventor
李延彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201510493574.1A priority Critical patent/CN105117300A/en
Publication of CN105117300A publication Critical patent/CN105117300A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The present invention discloses an apparatus for realizing high availability of heartbeat. The apparatus comprises a heartbeat component and a watchdog module, wherein the heartbeat component is used for performing a write operation on the watchdog module at a predetermined time interval; the watchdog module is used for triggering an operation of re-booting a system when the write operation is not performed within a preset period; and the predetermined time interval is shorter than or equal to the preset period. According to the apparatus for realizing high availability of heartbeat, provided by the present invention, when heartbeat abnormally terminates or the system fails, the watchdog module can automatically re-boot the system, release cluster resources and avoid the occurrence of data conflict, thereby improving the high-availability of heartbeat.

Description

A kind of device realizing heartbeat high availability
Technical field
The present invention relates to computer memory technical field, particularly relate to a kind of device realizing heartbeat high availability.
Background technology
Heartbeat is an assembly in Linux-HA engineering, it achieves a highly available cluster system.Heartbeat service and trunking communication are two key components of high-availability cluster, in heartbeat project, achieve this two functions by heartbeat module.
But it only can complete heartbeat monitor and resource take-over, its resource controlled or application program can not be monitored.Want monitoring resource or application program whether normal operation, third-party plug-in unit must be used.Such as ipfail, Mon, Ldirector etc.Equally, for operating system self produced problem, heartbeat also cannot monitor.If heartbeat abnormal end or system malfunctions, service disruption may be caused on the one hand, on the other hand because host node resource cannot discharge, and backup node has taken over the resource of host node, now just there occurs the situation of two nodes contention resource simultaneously, cause data collision phenomenon.
For avoiding the generation of above-mentioned technical matters, the invention provides a kind of device realizing heartbeat high availability.
Summary of the invention
The object of this invention is to provide a kind of device realizing heartbeat high availability, object is to solve heartbeat in prior art and easily causes the problem of data collision.
For solving the problems of the technologies described above, the invention provides a kind of device realizing heartbeat high availability, comprising heartbeat assembly and watchdog module;
Wherein, described heartbeat assembly is used for carrying out write operation every predetermined time interval to described watchdog module;
Described watchdog module is used for when not being performed write operation in predetermined period, triggers the operation of restarting systems; Wherein, described predetermined time interval is less than or equal to described predetermined period.
Alternatively, also comprise:
Starting module, entering duty for starting described watchdog module.
Alternatively, described watchdog module realizes especially by the hardware timer independent of kernel.
Alternatively, described hardware timer is MAX813,5045 or IMP813 chip.
Alternatively, described watchdog module realizes in conjunction with timer especially by kernel module.
Alternatively, described predetermined period is one minute.
Alternatively, linux kernel provides corresponding driving for described watchdog module.
Alternatively, the driving of described watchdog module only has one to be loaded at synchronization.
Alternatively, also comprise:
Reminding module, for pointing out the information of restarting systems to user.
The device realizing heartbeat high availability provided by the present invention, carries out write operation every predetermined time interval to watchdog module by heartbeat assembly; And when watchdog module is not performed write operation in predetermined period, namely trigger the operation of restarting systems.Visible, the device realizing heartbeat high availability provided by the present invention, in heartbeat abnormal end, or when system malfunctions, watchdog module can autoboot system, release cluster resource, avoids the generation of data collision, thus improves the high availability of heartbeat.
Accompanying drawing explanation
Fig. 1 is the schematic diagram realizing a kind of embodiment of the device of heartbeat high availability provided by the present invention.
Embodiment
Core of the present invention is to provide a kind of device realizing heartbeat high availability, by being integrated in heartbeat by watchdog.In heartbeat abnormal end, or during system malfunctions, watchdog can autoboot system, thus release cluster resource, avoid the generation of data collision.
In order to make those skilled in the art person understand the present invention program better, below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, this device comprises the schematic diagram realizing a kind of embodiment of the device of heartbeat high availability provided by the present invention:
Heartbeat assembly 1 and watchdog module 2;
Wherein, described heartbeat assembly 1 is for carrying out write operation every predetermined time interval to described watchdog module 2;
Described watchdog module 2, for when not being performed write operation in predetermined period, triggers the operation of restarting systems; Wherein, described predetermined time interval is less than or equal to described predetermined period.
The device realizing heartbeat high availability provided by the present invention, carries out write operation every predetermined time interval to watchdog module by heartbeat assembly; And when watchdog module is not performed write operation in predetermined period, namely trigger the operation of restarting systems.Visible, the device realizing heartbeat high availability provided by the present invention, in heartbeat abnormal end, or when system malfunctions, watchdog module can autoboot system, release cluster resource, avoids the generation of data collision, thus improves the high availability of heartbeat.
As a kind of embodiment, the device of the heartbeat of realization high availability provided by the present invention can further include:
Starting module, entering duty for starting watchdog module.
As a kind of embodiment, provided by the present inventionly realize in the device of heartbeat high availability, watchdog module can be hardware circuit in realization also can be software timer, and can reset automatically when system malfunctions system.
When watchdog module adopts hardware circuit, can realize independent of kernel.It is specifically as follows MAX813,5045 or IMP813 chip in one.It is pointed out that watchdog module can have multiple way of realization, be not limited to that these are several, this does not all affect realization of the present invention.
In the present embodiment, predetermined period can be specially 1 minute, can certainly according to user need select other numerical value.
Under linux kernel, the basic functional principle of watchdog is: after watchdog starts (namely/dev/watchdog equipment be opened after), if within the time interval (acquiescence is 1 minute) of a certain setting/and dev/watchdog is not performed write operation, and hardware watchdog circuit or software timer will restarting systems.Wherein ,/dev/watchdog is a major device number is 10, from the character device node of device number 130.
Linux kernel is not only various dissimilar watchdog hardware circuit and provides driving, additionally provides a pure software watchdog based on timer and drives.Drive source code bit is under kernel source code tree drivers/char/watchdog catalogue.
Particularly, watchdog option can be enabled in/etc/ha.d/ha.cf configuration file.Like this, Heartbeat is by every being equivalent to the long time write/dev/watchdog file (or equipment) of deadtime, therefore, there is any thing causing Heartbeat to upgrade the failure of watchdog equipment, once watchdog time out period (acquiescence is a minute) is expired, it is panic that watchdog will start kernel.Kernel fear is set to restart routine by the present embodiment.
Hardware watchdog must have hardware circuit support, and device node/dev/watchdog correspond to real physical equipment, and dissimilar hardware watchdog equipment is managed by corresponding hardware driving.Software watchdog is then realized by timer mechanism by a kernel module softdog.ko, and/dev/watchdog not correspond to real physical equipment, just for application provides an interface identical with operational hardware watchdog.
As a kind of embodiment, at any one time, a watchdog driver module can only be had to be loaded, management/dev/watchdog device node.If system does not have hardware watchdog circuit, software watchdog can be loaded and drive softdog.ko.
The device of the heartbeat of realization high availability provided by the present invention can add watchdog/dev/watchdog in/etc/ha.d/ha.cf configuration file, can automatically enable watchdog function.
Heartbeat process is closed on the primary node by " killall-9heartbeat " order.Owing to being illegally close heartbeat process, the resource that therefore heartbeat controls does not discharge.Backup node, after very short a period of time does not receive the response of host node, will think that host node breaks down, and then adapter host node resource.In this case, just occurred contention for resources situation, two nodes all take a resource, cause data collision.
For this situation, the device realizing heartbeat high availability provided by the present invention, the kernel monitoring module watchdog provided by Linux, is integrated into watchdog in Heartbeat.If Heartbeat abnormal end, or system malfunctions, watchdog can autoboot system, thus release cluster resource, avoid the generation of data collision.
The present embodiment can further include:
Reminding module, for pointing out the information of restarting systems to user.
" system will restart at once, can discharge cluster resource with that, strengthens system high-available.
The device realizing heartbeat high availability provided by the present invention, can carry out effective monitoring self health status by watchdog mechanism.Once node failure in the rigid reset of official hour internal trigger kernel, thus will discharge the resource in hand in time, the generation of cluster " fissure " can also be prevented simultaneously, improve the high availability of heartbeat.
In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment same or similar part mutually see.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (9)

1. realize a device for heartbeat high availability, it is characterized in that, comprise heartbeat assembly and watchdog module;
Wherein, described heartbeat assembly is used for carrying out write operation every predetermined time interval to described watchdog module;
Described watchdog module is used for when not being performed write operation in predetermined period, triggers the operation of restarting systems; Wherein, described predetermined time interval is less than or equal to described predetermined period.
2. realize the device of heartbeat high availability as claimed in claim 1, it is characterized in that, also comprise:
Starting module, entering duty for starting described watchdog module.
3. realize the device of heartbeat high availability as claimed in claim 2, it is characterized in that, described watchdog module realizes especially by the hardware timer independent of kernel.
4. realize the device of heartbeat high availability as claimed in claim 3, it is characterized in that, described hardware timer is MAX813,5045 or IMP813 chip.
5. realize the device of heartbeat high availability as claimed in claim 2, it is characterized in that, described watchdog module realizes in conjunction with timer especially by kernel module.
6. realize the device of heartbeat high availability as claimed in claim 2, it is characterized in that, described predetermined period is one minute.
7. realize the device of heartbeat high availability as claimed in claim 6, it is characterized in that, linux kernel provides corresponding driving for described watchdog module.
8. the device realizing heartbeat high availability as described in any one of claim 1 to 7, is characterized in that, the driving of described watchdog module only has one to be loaded at synchronization.
9. realize the device of heartbeat high availability as claimed in claim 8, it is characterized in that, also comprise:
Reminding module, for pointing out the information of restarting systems to user.
CN201510493574.1A 2015-08-12 2015-08-12 Apparatus for realizing high availability of heartbeat Pending CN105117300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510493574.1A CN105117300A (en) 2015-08-12 2015-08-12 Apparatus for realizing high availability of heartbeat

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510493574.1A CN105117300A (en) 2015-08-12 2015-08-12 Apparatus for realizing high availability of heartbeat

Publications (1)

Publication Number Publication Date
CN105117300A true CN105117300A (en) 2015-12-02

Family

ID=54665300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510493574.1A Pending CN105117300A (en) 2015-08-12 2015-08-12 Apparatus for realizing high availability of heartbeat

Country Status (1)

Country Link
CN (1) CN105117300A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105634813A (en) * 2016-01-04 2016-06-01 浪潮电子信息产业股份有限公司 Method for automatically switching nodes under dual-computer environment based on network
CN107528724A (en) * 2017-07-20 2017-12-29 北京奇安信科技有限公司 A kind of optimized treatment method and device of node cluster
CN107577575A (en) * 2017-09-06 2018-01-12 长沙曙通信息科技有限公司 A kind of disaster tolerant backup system management of monitor implementation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464811A (en) * 2008-12-29 2009-06-24 艾默生网络能源有限公司 Multitask monitoring management system
CN101980171A (en) * 2010-10-08 2011-02-23 广东威创视讯科技股份有限公司 Failure self-recovery method for software system and software watchdog system used by same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464811A (en) * 2008-12-29 2009-06-24 艾默生网络能源有限公司 Multitask monitoring management system
CN101980171A (en) * 2010-10-08 2011-02-23 广东威创视讯科技股份有限公司 Failure self-recovery method for software system and software watchdog system used by same

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105634813A (en) * 2016-01-04 2016-06-01 浪潮电子信息产业股份有限公司 Method for automatically switching nodes under dual-computer environment based on network
CN107528724A (en) * 2017-07-20 2017-12-29 北京奇安信科技有限公司 A kind of optimized treatment method and device of node cluster
CN107528724B (en) * 2017-07-20 2020-09-29 奇安信科技集团股份有限公司 Optimization processing method and device for node cluster
CN107577575A (en) * 2017-09-06 2018-01-12 长沙曙通信息科技有限公司 A kind of disaster tolerant backup system management of monitor implementation method

Similar Documents

Publication Publication Date Title
US10353779B2 (en) Systems and methods for detection of firmware image corruption and initiation of recovery
US8707290B2 (en) Firmware update in an information handling system employing redundant management modules
US9720757B2 (en) Securing crash dump files
US20150205676A1 (en) Server Control Method and Server Control Device
CN103324495A (en) Method and system for data center server boot management
CN106293979A (en) Detection procedure is without the method and apparatus of response
US9696988B2 (en) Upgrade processing method, apparatus and system for CPLD
CN102609349A (en) Method and system for screen capture in server failure
CN102880527B (en) Data recovery method of baseboard management controller
CN109976926A (en) Method, circuit, terminal and the storage medium of protection BMC renewal process are restarted in a kind of shielding
US10983825B2 (en) Processing for multiple containers are deployed on the physical machine
CN105117300A (en) Apparatus for realizing high availability of heartbeat
CN102819466A (en) Method and device for processing operating system exceptions
US20150046748A1 (en) Information processing device and virtual machine control method
CN103634388B (en) Controller is restarted in treatment storage server method and relevant device and communication system
CN110083491A (en) A kind of BIOS initialization method, apparatus, equipment and storage medium
CN104346188A (en) Updating method of substrate management controller and updating system of substrate management controller
CN106776206A (en) The method of monitor process state, device and electronic equipment
US20240054085A1 (en) Method for controlling a target memory by programmably selecting an action execution circuit module corresponding to a triggered preset state
CN111371642B (en) Network card fault detection method, device, equipment and storage medium
CN103890713A (en) Apparatus and method for managing register information in a processing system
CN103428022A (en) Method and system for network element configuration data file backup and recovery
Carvalho et al. PCI express hotplug implementation for ATCA based instrumentation
US20210406064A1 (en) Systems and methods for asynchronous job scheduling among a plurality of managed information handling systems
TW201820137A (en) Device having restarting function

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151202

WD01 Invention patent application deemed withdrawn after publication