CN106452696A - Control system of server cluster - Google Patents

Control system of server cluster Download PDF

Info

Publication number
CN106452696A
CN106452696A CN201610970085.5A CN201610970085A CN106452696A CN 106452696 A CN106452696 A CN 106452696A CN 201610970085 A CN201610970085 A CN 201610970085A CN 106452696 A CN106452696 A CN 106452696A
Authority
CN
China
Prior art keywords
controller
controller group
group
server cluster
control system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610970085.5A
Other languages
Chinese (zh)
Inventor
王佳琪
董海廷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201610970085.5A priority Critical patent/CN106452696A/en
Publication of CN106452696A publication Critical patent/CN106452696A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/22Arrangements for detecting or preventing errors in the information received using redundant apparatus to increase reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0659Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a control system of a server cluster. The system comprises a first controller group, a second controller group and a third controller group; every two controller groups are in communication connection, wherein the first controller group is used for performing high-availability work in the server cluster so as to execute a control task; the second controller group is used for taking over the control task of the first controller group when the first controller group is in fault; and the third controller group is used for determining whether the first controller group continues the work or the second controller group takes over the control task of the first controller group according to the current working state of the first controller group when determining that a communication link between the first controller group and the second controller group is in fault. Through the application of the technical scheme provided by the embodiment of the invention, the condition that data inconsistency in the server cluster influences the normal work of the server cluster since the second controller group scrambles for shared resources with the first controller group is avoided.

Description

A kind of control system of server cluster
Technical field
The present invention relates to Computer Applied Technology field, more particularly to a kind of control system of server cluster.
Background technology
With the fast development of computer technology, increasing industry needs to use server cluster.Server cluster In control system normal work can play very important effect for server cluster.
In the control system in server cluster, it is attached by physical link between controller, and be in communication with each other. In actual applications, have a Management Controller between multiple controllers, this Management Controller has whole control system Administrative power.Communication link between controller breaks down, but single controller can normal work when, each controller Will be considered that other controllers are not currently in line, oneself be the uniquely online controller in whole control system, should obtain whole The administrative power of control system, can fight for shared resource between multiple controllers, different controllers may enter to shared resource The corresponding read-write operation of row, thus lead to data inconsistent.This situation is also referred to as Schizencephaly.
There is presently no a kind of preferable method and process cluster Schizencephaly problem.So, how effectively solving cluster Schizencephaly is asked Topic, is the technical problem that current those skilled in the art are badly in need of solving.
Content of the invention
It is an object of the invention to provide a kind of control system of server cluster, with effectively solving cluster Schizencephaly problem, keep away Exempt from second controller group and fight for shared resource with the first controller group, cause data in server cluster inconsistent, impact service The normal work of device cluster.
For solving above-mentioned technical problem, the present invention provides following technical scheme:
A kind of control system of server cluster, including the first controller group, second controller group and the 3rd controller group, Communicate to connect between each two controller group, described second controller group is the calamity preparation controller group of described first controller group, Wherein,
Described first controller group, for carrying out high availability work in described server cluster, executes control task;
Described second controller group, for when described first controller group breaks down, taking over described first controller The control task of group;
Described 3rd controller group, for determining leading between described first controller group and described second controller group When letter link breaks down, according to the current operating state of described first controller group, determination is by described first controller group Work on or taken over by described second controller group the control task of described first controller group.
In a kind of specific embodiment of the present invention,
Described 3rd controller group, specifically for when described first controller group is not presently within normal operating conditions, Determine the control task being taken over described first controller group by described second controller group.
In a kind of specific embodiment of the present invention,
Described 3rd controller group, specifically for when described first controller group is currently at normal operating conditions, really Determine to be worked on by described first controller group.
In a kind of specific embodiment of the present invention,
Described 3rd controller group, is additionally operable to when described first controller group is currently at normal operating conditions, if Determine that described second controller group is currently at normal operating conditions, then stop the work of described second controller group.
In a kind of specific embodiment of the present invention, described first controller group comprises the first controller and the second control Device, described first controller and the communication connection of described second controller, spare controller each other.
In a kind of specific embodiment of the present invention, described second controller group comprises the 3rd controller and the 4th control Device, described 3rd controller and described 4th controller communication connection, spare controller each other.
In a kind of specific embodiment of the present invention, described 3rd controller group comprises the 5th controller and the 6th control Device, described 5th controller and described 6th controller communication connection, spare controller each other.
The technical scheme that the application embodiment of the present invention is provided, communicates to connect between each two controller group, the 3rd control Device group, can be according to the first control when determining that the communication link between the first controller group and second controller group breaks down The current operating state of device group, determination is to be worked on or controlled by second controller group adapter first by the first controller group The control task of device group, it is to avoid second controller group and the first controller group fight for shared resource, causes number in server cluster According to inconsistent, the normal work of impact server cluster.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of structural representation of the control system of server cluster in the embodiment of the present invention;
Fig. 2 is the first controller group and second controller group operating diagram in the embodiment of the present invention;
Fig. 3 is the 3rd controller group arbitration schematic diagram in the embodiment of the present invention.
Specific embodiment
In order that those skilled in the art more fully understand the present invention program, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiment is only a part of embodiment of the present invention, rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative work premise Lower obtained every other embodiment, broadly falls into the scope of protection of the invention.
Shown in Figure 1, a kind of structural representation of the control system of the server cluster being provided by the embodiment of the present invention Figure, this system includes the first controller group 110, the second group controller group 120 and the 3rd controller group 130, each two controller Communicate to connect between group, second controller group 120 is the calamity preparation controller group of the first controller group 110.
Wherein, the first controller group 110, for carrying out high availability work in server cluster, executes control task;
Second controller group 120, for when the first controller group 110 breaks down, adapter the first controller group 110 Control task;
3rd controller group 130, for determining the communication between the first controller group 110 and second controller group 120 When link breaks down, according to the current operating state of the first controller group 110, determination is to be continued by the first controller group 110 Work or the control task by second controller group 120 adapter the first controller group 110.
In embodiments of the present invention, the control system of server cluster includes three controller groups, the respectively first control Device group 110, second controller group 120 and the 3rd controller group 130, each controller group can comprise one or more controls Device.Communicate to connect between each two controller group, transmission and the working state monitoring of message can be carried out.Second controller group 120 is the calamity preparation controller group of the first controller group 110.
In embodiments of the present invention, the first controller group 110 carries out high availability work in server cluster, executes control Task processed.Second controller group 120 as the calamity preparation controller group of the first controller group 110, by with the first controller group 110 Between communication link, can know whether the first controller group 110 currently breaks down.Occur in the first controller group 110 During fault, second controller group 120 can adapter the first controller group 110 control task.
3rd controller group 130 is communicated to connect with the first controller group 110 and second controller group 120 respectively, when first When communication link between controller group 110 and second controller group 120 breaks down, the first controller group 110 and second is controlled Device group 120 processed all can send corresponding fault message to the 3rd controller group 130, and the 3rd controller group 130 is according to the first control Device group 110 processed and/or the fault message of second controller group 120 transmission, you can determine the first controller group 110 and the second control Communication link between device group 120 breaks down.
3rd controller group 130 can determine the first controller by the communication link between the first controller group 110 The current operating state of group 110.And then can determine and controlled by first according to the current operating state of the first controller group 110 Device group 110 works on or the control task by second controller group 120 adapter the first controller group 110.3rd controller Group 130 alternatively referred to as arbitration controller group.Second controller group 120 so can be avoided cannot to know the first controller group During 110 working condition, fight for shared resource with the first controller group 110.
As shown in Fig. 2 being the first controller group and second controller group operating diagram.Site1 be the first controller group, Site2 is second controller group, has communication link between site1 and site2.When this communication link is in normal connected state When, by this communication link, site2 can know whether site1 currently breaks down.In this case, the 3rd controller group Any operation can not be carried out.When this communication link breaks down, site1 and site2 can send event accordingly to site3 Barrier information, as shown in figure 3, being arbitrated by site3, determination is to be worked on by site1 or by site2 adapter site1 Control task.
The system that the application embodiment of the present invention is provided, communicates to connect between each two controller group, the 3rd controller group When determining that the communication link between the first controller group and second controller group breaks down, can be according to the first controller group Current operating state, determination is to be worked on by the first controller group or by second controller group adapter the first controller group Control task, it is to avoid second controller group and the first controller group fight for shared resource, cause in server cluster data not Unanimously, affect the normal work of server cluster.
In a kind of specific embodiment of the present invention, the 3rd controller group 130 can be specifically in the first controller When group 110 is not presently within normal operating conditions, determine by the control of second controller group 120 adapter the first controller group 110 Task.
Communication link between determination the first controller group 110 and second controller group 120 for the 3rd controller group 130 is sent out Raw fault, and when the first controller group 110 is not presently within normal operating conditions, the 3rd controller group 130 can determine first Controller group 110 there occurs fault, and then can determine by the control of second controller group 120 adapter the first controller group 110 Task.
In another kind of specific embodiment of the present invention, the 3rd controller group 130 can be specifically in the first control When device group 110 is currently at normal operating conditions, determines and worked on by the first controller group 110.
Communication link between determination the first controller group 110 and second controller group 120 for the 3rd controller group 130 is sent out Raw fault, but when the first controller group 110 is currently at normal operating conditions, the 3rd controller group 130 only can determine currently It is only that communication link between the first controller group 110 and second controller group 120 there occurs fault, and the first controller group 110 do not break down, and are worked in such a case, it is possible to determine by the first controller group 110.
In a kind of specific embodiment of the present invention, the 3rd controller group 130 is additionally operable to work as in the first controller group 110 Before be in if it is determined that second controller group 120 is currently at normal operating conditions during normal operating conditions, then stop second control The work of device group 120 processed.
Communication link between determination the first controller group 110 and second controller group 120 for the 3rd controller group 130 is sent out Raw fault, but the first controller group 110 is currently at during normal operating conditions if it is determined that second controller group 120 is current It is in normal operating conditions, then can stop the work of second controller group 120.Avoid second controller group 120 and the first control Device group 110 processed fights for shared resource.
In one embodiment of the invention, the first controller group 110 can comprise the first controller and second controller, First controller and second controller communication connection, spare controller each other.
In actual applications, the first controller can be set to master controller, second controller is set to standby control Device processed, the first controller group 110 correspondence with foreign country is carried out by the first controller, when the first controller breaks down, starts the Two controllers, by the task of second controller adapter first controller.
In another embodiment of the present invention, second controller group 120 can include the 3rd controller and the 4th control Device, the 3rd controller and the communication connection of the 4th controller, spare controller each other.
Equally, in actual applications, the 3rd controller can be set to master controller, the 4th controller is set to standby With controller, second controller group 120 correspondence with foreign country is carried out by the 3rd controller, when the 3rd controller breaks down, opens Dynamic 4th controller, by the task of the 4th controller adapter the 3rd controller.
In another embodiment of the present invention, the 3rd controller group 130 can comprise the 5th controller and the 6th control Device, the 5th controller and the communication connection of the 6th controller, spare controller each other.
Equally, in actual applications, the 5th controller can be set to master controller, the 6th controller is set to standby With controller, the 3rd controller group 130 correspondence with foreign country is carried out by the 5th controller, when the 5th controller breaks down, opens Dynamic 6th controller, by the task of the 6th controller adapter the 5th controller.
The fault-tolerance of each controller group can be improved by spare controller.
In this specification, each embodiment is described by the way of going forward one by one, and what each embodiment stressed is and other The difference of embodiment, between each embodiment same or similar partly mutually referring to.
Professional further appreciates that, in conjunction with the unit of each example of the embodiments described herein description And algorithm steps, can with electronic hardware, computer software or the two be implemented in combination in, in order to clearly demonstrate hardware and The interchangeability of software, generally describes composition and the step of each example in the above description according to function.These Function to be executed with hardware or software mode actually, the application-specific depending on technical scheme and design constraint.Specialty Technical staff can use different methods to each specific application realize described function, but this realization should Think beyond the scope of this invention.
Specific case used herein is set forth to the principle of the present invention and embodiment, the saying of above example Bright it is only intended to help and understands technical scheme and its core concept.It should be pointed out that it is common for the art For technical staff, under the premise without departing from the principles of the invention, the present invention can also be carried out with some improvement and modify, these Improve and modify and also fall in the protection domain of the claims in the present invention.

Claims (7)

1. a kind of control system of server cluster is it is characterised in that include the first controller group, second controller group and the 3rd Controller group, communicates to connect between each two controller group, described second controller group is that the calamity of described first controller group is standby Controller group, wherein,
Described first controller group, for carrying out high availability work in described server cluster, executes control task;
Described second controller group, for when described first controller group breaks down, taking over described first controller group Control task;
Described 3rd controller group, for determining the communication chain between described first controller group and described second controller group When breaking down in road, according to the current operating state of described first controller group, determination is to be continued by described first controller group Work or the control task being taken over described first controller group by described second controller group.
2. server cluster according to claim 1 control system it is characterised in that
Described 3rd controller group, specifically for when described first controller group is not presently within normal operating conditions, determining Take over the control task of described first controller group by described second controller group.
3. server cluster according to claim 1 control system it is characterised in that
Described 3rd controller group, specifically for when described first controller group is currently at normal operating conditions, determine by Described first controller group works on.
4. server cluster according to claim 3 control system it is characterised in that
Described 3rd controller group, be additionally operable to when described first controller group is currently at normal operating conditions if it is determined that Described second controller group is currently at normal operating conditions, then stop the work of described second controller group.
5. the control system of the server cluster according to any one of Claims 1-4 is it is characterised in that described first controls Device group processed comprises the first controller and second controller, described first controller and the communication connection of described second controller, each other Spare controller.
6. the control system of server cluster according to claim 5 is it is characterised in that described second controller group comprises 3rd controller and the 4th controller, described 3rd controller and described 4th controller communication connection, spare controller each other.
7. the control system of server cluster according to claim 6 is it is characterised in that described 3rd controller group comprises 5th controller and the 6th controller, described 5th controller and described 6th controller communication connection, spare controller each other.
CN201610970085.5A 2016-10-28 2016-10-28 Control system of server cluster Pending CN106452696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610970085.5A CN106452696A (en) 2016-10-28 2016-10-28 Control system of server cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610970085.5A CN106452696A (en) 2016-10-28 2016-10-28 Control system of server cluster

Publications (1)

Publication Number Publication Date
CN106452696A true CN106452696A (en) 2017-02-22

Family

ID=58180677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610970085.5A Pending CN106452696A (en) 2016-10-28 2016-10-28 Control system of server cluster

Country Status (1)

Country Link
CN (1) CN106452696A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107342902A (en) * 2017-07-14 2017-11-10 郑州云海信息技术有限公司 A kind of link reconfiguration method and system of four controls server
CN110529986A (en) * 2019-09-25 2019-12-03 珠海格力电器股份有限公司 Dual system communicates topological structure, dual system air conditioner
CN114484766A (en) * 2021-12-21 2022-05-13 珠海格力电器股份有限公司 Method for determining master controller and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102916825A (en) * 2011-08-01 2013-02-06 中兴通讯股份有限公司 Management equipment of dual-computer hot standby system, management method and dual-computer hot standby system
CN103596652A (en) * 2013-07-30 2014-02-19 华为技术有限公司 Network control method and device
CN103607310A (en) * 2013-11-29 2014-02-26 华为技术有限公司 Method for arbitration of remote disaster recovery
CN103905247A (en) * 2014-03-10 2014-07-02 北京交通大学 Two-unit standby method and system based on multi-client judgment
US20150113340A1 (en) * 2012-06-29 2015-04-23 Huawei Technologies Co., Ltd. Method and apparatus for implementing heartbeat service of high availability cluster
CN105893176A (en) * 2016-03-28 2016-08-24 杭州宏杉科技有限公司 Management method and device of network storage system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102916825A (en) * 2011-08-01 2013-02-06 中兴通讯股份有限公司 Management equipment of dual-computer hot standby system, management method and dual-computer hot standby system
US20150113340A1 (en) * 2012-06-29 2015-04-23 Huawei Technologies Co., Ltd. Method and apparatus for implementing heartbeat service of high availability cluster
CN103596652A (en) * 2013-07-30 2014-02-19 华为技术有限公司 Network control method and device
CN103607310A (en) * 2013-11-29 2014-02-26 华为技术有限公司 Method for arbitration of remote disaster recovery
CN103905247A (en) * 2014-03-10 2014-07-02 北京交通大学 Two-unit standby method and system based on multi-client judgment
CN105893176A (en) * 2016-03-28 2016-08-24 杭州宏杉科技有限公司 Management method and device of network storage system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107342902A (en) * 2017-07-14 2017-11-10 郑州云海信息技术有限公司 A kind of link reconfiguration method and system of four controls server
CN110529986A (en) * 2019-09-25 2019-12-03 珠海格力电器股份有限公司 Dual system communicates topological structure, dual system air conditioner
CN110529986B (en) * 2019-09-25 2020-09-15 珠海格力电器股份有限公司 Dual-system communication topological structure and dual-system air conditioner
CN114484766A (en) * 2021-12-21 2022-05-13 珠海格力电器股份有限公司 Method for determining master controller and related equipment

Similar Documents

Publication Publication Date Title
US9319284B2 (en) Operation delay monitoring method, operation management apparatus, and operation management program
CN102622279B (en) Redundancy control system, method and Management Controller
US9838245B2 (en) Systems and methods for improved fault tolerance in solicited information handling systems
JP2005209201A (en) Node management in high-availability cluster
CN105681077A (en) Fault processing method, device and system
CN106452696A (en) Control system of server cluster
CN111628941A (en) Network traffic classification processing method, device, equipment and medium
WO2015190934A1 (en) Method and system for controlling well operations
CN112380089A (en) Data center monitoring and early warning method and system
CN107729190A (en) A kind of I/O path failure metastasis treating method and system
CN104125049A (en) Redundancy implementation method of PCIE (Peripheral Component Interface Express) device based on BRICKLAND platform
CN102495786B (en) Server system
CN106549865A (en) Method, device and software defined network SDN controllers that service dynamic is recovered
CN103338240B (en) The Cloud Server automatic monitored control system of monitoring automatic drift and method
KR20150124642A (en) Communication failure recover method of parallel-connecte server system
CN104320285A (en) Website running status monitoring method and device
CN112202613B (en) Optical cable fault processing method, device, equipment and computer readable storage medium
CN108983695A (en) A kind of master-slave switching method and device based on Complex Programmable Logic Devices
CN105530110A (en) Network failure detection method and related network elements
CN103326880A (en) Genesys calling system high-availability cloud computing system and method
US20150154130A1 (en) Method for operating an automation device
WO2021077797A1 (en) Quality of service measurement method and device, and user plane function
CN108021463A (en) A kind of GPU failure management methods based on finite state machine
CN105045691B (en) A kind of fault detection method and system
CN107896176A (en) A kind of processing method of calculate node, intelligent terminal and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170222

RJ01 Rejection of invention patent application after publication