CN106452696A - Control system of server cluster - Google Patents
Control system of server cluster Download PDFInfo
- Publication number
- CN106452696A CN106452696A CN201610970085.5A CN201610970085A CN106452696A CN 106452696 A CN106452696 A CN 106452696A CN 201610970085 A CN201610970085 A CN 201610970085A CN 106452696 A CN106452696 A CN 106452696A
- Authority
- CN
- China
- Prior art keywords
- controller
- controller group
- group
- server cluster
- control system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/22—Arrangements for detecting or preventing errors in the information received using redundant apparatus to increase reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0659—Management of faults, events, alarms or notifications using network fault recovery by isolating or reconfiguring faulty entities
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Hardware Redundancy (AREA)
Abstract
The invention discloses a control system of a server cluster. The system comprises a first controller group, a second controller group and a third controller group; every two controller groups are in communication connection, wherein the first controller group is used for performing high-availability work in the server cluster so as to execute a control task; the second controller group is used for taking over the control task of the first controller group when the first controller group is in fault; and the third controller group is used for determining whether the first controller group continues the work or the second controller group takes over the control task of the first controller group according to the current working state of the first controller group when determining that a communication link between the first controller group and the second controller group is in fault. Through the application of the technical scheme provided by the embodiment of the invention, the condition that data inconsistency in the server cluster influences the normal work of the server cluster since the second controller group scrambles for shared resources with the first controller group is avoided.
Description
Technical field
The present invention relates to Computer Applied Technology field, more particularly to a kind of control system of server cluster.
Background technology
With the fast development of computer technology, increasing industry needs to use server cluster.Server cluster
In control system normal work can play very important effect for server cluster.
In the control system in server cluster, it is attached by physical link between controller, and be in communication with each other.
In actual applications, have a Management Controller between multiple controllers, this Management Controller has whole control system
Administrative power.Communication link between controller breaks down, but single controller can normal work when, each controller
Will be considered that other controllers are not currently in line, oneself be the uniquely online controller in whole control system, should obtain whole
The administrative power of control system, can fight for shared resource between multiple controllers, different controllers may enter to shared resource
The corresponding read-write operation of row, thus lead to data inconsistent.This situation is also referred to as Schizencephaly.
There is presently no a kind of preferable method and process cluster Schizencephaly problem.So, how effectively solving cluster Schizencephaly is asked
Topic, is the technical problem that current those skilled in the art are badly in need of solving.
Content of the invention
It is an object of the invention to provide a kind of control system of server cluster, with effectively solving cluster Schizencephaly problem, keep away
Exempt from second controller group and fight for shared resource with the first controller group, cause data in server cluster inconsistent, impact service
The normal work of device cluster.
For solving above-mentioned technical problem, the present invention provides following technical scheme:
A kind of control system of server cluster, including the first controller group, second controller group and the 3rd controller group,
Communicate to connect between each two controller group, described second controller group is the calamity preparation controller group of described first controller group,
Wherein,
Described first controller group, for carrying out high availability work in described server cluster, executes control task;
Described second controller group, for when described first controller group breaks down, taking over described first controller
The control task of group;
Described 3rd controller group, for determining leading between described first controller group and described second controller group
When letter link breaks down, according to the current operating state of described first controller group, determination is by described first controller group
Work on or taken over by described second controller group the control task of described first controller group.
In a kind of specific embodiment of the present invention,
Described 3rd controller group, specifically for when described first controller group is not presently within normal operating conditions,
Determine the control task being taken over described first controller group by described second controller group.
In a kind of specific embodiment of the present invention,
Described 3rd controller group, specifically for when described first controller group is currently at normal operating conditions, really
Determine to be worked on by described first controller group.
In a kind of specific embodiment of the present invention,
Described 3rd controller group, is additionally operable to when described first controller group is currently at normal operating conditions, if
Determine that described second controller group is currently at normal operating conditions, then stop the work of described second controller group.
In a kind of specific embodiment of the present invention, described first controller group comprises the first controller and the second control
Device, described first controller and the communication connection of described second controller, spare controller each other.
In a kind of specific embodiment of the present invention, described second controller group comprises the 3rd controller and the 4th control
Device, described 3rd controller and described 4th controller communication connection, spare controller each other.
In a kind of specific embodiment of the present invention, described 3rd controller group comprises the 5th controller and the 6th control
Device, described 5th controller and described 6th controller communication connection, spare controller each other.
The technical scheme that the application embodiment of the present invention is provided, communicates to connect between each two controller group, the 3rd control
Device group, can be according to the first control when determining that the communication link between the first controller group and second controller group breaks down
The current operating state of device group, determination is to be worked on or controlled by second controller group adapter first by the first controller group
The control task of device group, it is to avoid second controller group and the first controller group fight for shared resource, causes number in server cluster
According to inconsistent, the normal work of impact server cluster.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of structural representation of the control system of server cluster in the embodiment of the present invention;
Fig. 2 is the first controller group and second controller group operating diagram in the embodiment of the present invention;
Fig. 3 is the 3rd controller group arbitration schematic diagram in the embodiment of the present invention.
Specific embodiment
In order that those skilled in the art more fully understand the present invention program, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.Obviously, described embodiment is only a part of embodiment of the present invention, rather than
Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creative work premise
Lower obtained every other embodiment, broadly falls into the scope of protection of the invention.
Shown in Figure 1, a kind of structural representation of the control system of the server cluster being provided by the embodiment of the present invention
Figure, this system includes the first controller group 110, the second group controller group 120 and the 3rd controller group 130, each two controller
Communicate to connect between group, second controller group 120 is the calamity preparation controller group of the first controller group 110.
Wherein, the first controller group 110, for carrying out high availability work in server cluster, executes control task;
Second controller group 120, for when the first controller group 110 breaks down, adapter the first controller group 110
Control task;
3rd controller group 130, for determining the communication between the first controller group 110 and second controller group 120
When link breaks down, according to the current operating state of the first controller group 110, determination is to be continued by the first controller group 110
Work or the control task by second controller group 120 adapter the first controller group 110.
In embodiments of the present invention, the control system of server cluster includes three controller groups, the respectively first control
Device group 110, second controller group 120 and the 3rd controller group 130, each controller group can comprise one or more controls
Device.Communicate to connect between each two controller group, transmission and the working state monitoring of message can be carried out.Second controller group
120 is the calamity preparation controller group of the first controller group 110.
In embodiments of the present invention, the first controller group 110 carries out high availability work in server cluster, executes control
Task processed.Second controller group 120 as the calamity preparation controller group of the first controller group 110, by with the first controller group 110
Between communication link, can know whether the first controller group 110 currently breaks down.Occur in the first controller group 110
During fault, second controller group 120 can adapter the first controller group 110 control task.
3rd controller group 130 is communicated to connect with the first controller group 110 and second controller group 120 respectively, when first
When communication link between controller group 110 and second controller group 120 breaks down, the first controller group 110 and second is controlled
Device group 120 processed all can send corresponding fault message to the 3rd controller group 130, and the 3rd controller group 130 is according to the first control
Device group 110 processed and/or the fault message of second controller group 120 transmission, you can determine the first controller group 110 and the second control
Communication link between device group 120 breaks down.
3rd controller group 130 can determine the first controller by the communication link between the first controller group 110
The current operating state of group 110.And then can determine and controlled by first according to the current operating state of the first controller group 110
Device group 110 works on or the control task by second controller group 120 adapter the first controller group 110.3rd controller
Group 130 alternatively referred to as arbitration controller group.Second controller group 120 so can be avoided cannot to know the first controller group
During 110 working condition, fight for shared resource with the first controller group 110.
As shown in Fig. 2 being the first controller group and second controller group operating diagram.Site1 be the first controller group,
Site2 is second controller group, has communication link between site1 and site2.When this communication link is in normal connected state
When, by this communication link, site2 can know whether site1 currently breaks down.In this case, the 3rd controller group
Any operation can not be carried out.When this communication link breaks down, site1 and site2 can send event accordingly to site3
Barrier information, as shown in figure 3, being arbitrated by site3, determination is to be worked on by site1 or by site2 adapter site1
Control task.
The system that the application embodiment of the present invention is provided, communicates to connect between each two controller group, the 3rd controller group
When determining that the communication link between the first controller group and second controller group breaks down, can be according to the first controller group
Current operating state, determination is to be worked on by the first controller group or by second controller group adapter the first controller group
Control task, it is to avoid second controller group and the first controller group fight for shared resource, cause in server cluster data not
Unanimously, affect the normal work of server cluster.
In a kind of specific embodiment of the present invention, the 3rd controller group 130 can be specifically in the first controller
When group 110 is not presently within normal operating conditions, determine by the control of second controller group 120 adapter the first controller group 110
Task.
Communication link between determination the first controller group 110 and second controller group 120 for the 3rd controller group 130 is sent out
Raw fault, and when the first controller group 110 is not presently within normal operating conditions, the 3rd controller group 130 can determine first
Controller group 110 there occurs fault, and then can determine by the control of second controller group 120 adapter the first controller group 110
Task.
In another kind of specific embodiment of the present invention, the 3rd controller group 130 can be specifically in the first control
When device group 110 is currently at normal operating conditions, determines and worked on by the first controller group 110.
Communication link between determination the first controller group 110 and second controller group 120 for the 3rd controller group 130 is sent out
Raw fault, but when the first controller group 110 is currently at normal operating conditions, the 3rd controller group 130 only can determine currently
It is only that communication link between the first controller group 110 and second controller group 120 there occurs fault, and the first controller group
110 do not break down, and are worked in such a case, it is possible to determine by the first controller group 110.
In a kind of specific embodiment of the present invention, the 3rd controller group 130 is additionally operable to work as in the first controller group 110
Before be in if it is determined that second controller group 120 is currently at normal operating conditions during normal operating conditions, then stop second control
The work of device group 120 processed.
Communication link between determination the first controller group 110 and second controller group 120 for the 3rd controller group 130 is sent out
Raw fault, but the first controller group 110 is currently at during normal operating conditions if it is determined that second controller group 120 is current
It is in normal operating conditions, then can stop the work of second controller group 120.Avoid second controller group 120 and the first control
Device group 110 processed fights for shared resource.
In one embodiment of the invention, the first controller group 110 can comprise the first controller and second controller,
First controller and second controller communication connection, spare controller each other.
In actual applications, the first controller can be set to master controller, second controller is set to standby control
Device processed, the first controller group 110 correspondence with foreign country is carried out by the first controller, when the first controller breaks down, starts the
Two controllers, by the task of second controller adapter first controller.
In another embodiment of the present invention, second controller group 120 can include the 3rd controller and the 4th control
Device, the 3rd controller and the communication connection of the 4th controller, spare controller each other.
Equally, in actual applications, the 3rd controller can be set to master controller, the 4th controller is set to standby
With controller, second controller group 120 correspondence with foreign country is carried out by the 3rd controller, when the 3rd controller breaks down, opens
Dynamic 4th controller, by the task of the 4th controller adapter the 3rd controller.
In another embodiment of the present invention, the 3rd controller group 130 can comprise the 5th controller and the 6th control
Device, the 5th controller and the communication connection of the 6th controller, spare controller each other.
Equally, in actual applications, the 5th controller can be set to master controller, the 6th controller is set to standby
With controller, the 3rd controller group 130 correspondence with foreign country is carried out by the 5th controller, when the 5th controller breaks down, opens
Dynamic 6th controller, by the task of the 6th controller adapter the 5th controller.
The fault-tolerance of each controller group can be improved by spare controller.
In this specification, each embodiment is described by the way of going forward one by one, and what each embodiment stressed is and other
The difference of embodiment, between each embodiment same or similar partly mutually referring to.
Professional further appreciates that, in conjunction with the unit of each example of the embodiments described herein description
And algorithm steps, can with electronic hardware, computer software or the two be implemented in combination in, in order to clearly demonstrate hardware and
The interchangeability of software, generally describes composition and the step of each example in the above description according to function.These
Function to be executed with hardware or software mode actually, the application-specific depending on technical scheme and design constraint.Specialty
Technical staff can use different methods to each specific application realize described function, but this realization should
Think beyond the scope of this invention.
Specific case used herein is set forth to the principle of the present invention and embodiment, the saying of above example
Bright it is only intended to help and understands technical scheme and its core concept.It should be pointed out that it is common for the art
For technical staff, under the premise without departing from the principles of the invention, the present invention can also be carried out with some improvement and modify, these
Improve and modify and also fall in the protection domain of the claims in the present invention.
Claims (7)
1. a kind of control system of server cluster is it is characterised in that include the first controller group, second controller group and the 3rd
Controller group, communicates to connect between each two controller group, described second controller group is that the calamity of described first controller group is standby
Controller group, wherein,
Described first controller group, for carrying out high availability work in described server cluster, executes control task;
Described second controller group, for when described first controller group breaks down, taking over described first controller group
Control task;
Described 3rd controller group, for determining the communication chain between described first controller group and described second controller group
When breaking down in road, according to the current operating state of described first controller group, determination is to be continued by described first controller group
Work or the control task being taken over described first controller group by described second controller group.
2. server cluster according to claim 1 control system it is characterised in that
Described 3rd controller group, specifically for when described first controller group is not presently within normal operating conditions, determining
Take over the control task of described first controller group by described second controller group.
3. server cluster according to claim 1 control system it is characterised in that
Described 3rd controller group, specifically for when described first controller group is currently at normal operating conditions, determine by
Described first controller group works on.
4. server cluster according to claim 3 control system it is characterised in that
Described 3rd controller group, be additionally operable to when described first controller group is currently at normal operating conditions if it is determined that
Described second controller group is currently at normal operating conditions, then stop the work of described second controller group.
5. the control system of the server cluster according to any one of Claims 1-4 is it is characterised in that described first controls
Device group processed comprises the first controller and second controller, described first controller and the communication connection of described second controller, each other
Spare controller.
6. the control system of server cluster according to claim 5 is it is characterised in that described second controller group comprises
3rd controller and the 4th controller, described 3rd controller and described 4th controller communication connection, spare controller each other.
7. the control system of server cluster according to claim 6 is it is characterised in that described 3rd controller group comprises
5th controller and the 6th controller, described 5th controller and described 6th controller communication connection, spare controller each other.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610970085.5A CN106452696A (en) | 2016-10-28 | 2016-10-28 | Control system of server cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610970085.5A CN106452696A (en) | 2016-10-28 | 2016-10-28 | Control system of server cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106452696A true CN106452696A (en) | 2017-02-22 |
Family
ID=58180677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610970085.5A Pending CN106452696A (en) | 2016-10-28 | 2016-10-28 | Control system of server cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106452696A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107342902A (en) * | 2017-07-14 | 2017-11-10 | 郑州云海信息技术有限公司 | A kind of link reconfiguration method and system of four controls server |
CN110529986A (en) * | 2019-09-25 | 2019-12-03 | 珠海格力电器股份有限公司 | Dual system communicates topological structure, dual system air conditioner |
CN114484766A (en) * | 2021-12-21 | 2022-05-13 | 珠海格力电器股份有限公司 | Method for determining master controller and related equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102916825A (en) * | 2011-08-01 | 2013-02-06 | 中兴通讯股份有限公司 | Management equipment of dual-computer hot standby system, management method and dual-computer hot standby system |
CN103596652A (en) * | 2013-07-30 | 2014-02-19 | 华为技术有限公司 | Network control method and device |
CN103607310A (en) * | 2013-11-29 | 2014-02-26 | 华为技术有限公司 | Method for arbitration of remote disaster recovery |
CN103905247A (en) * | 2014-03-10 | 2014-07-02 | 北京交通大学 | Two-unit standby method and system based on multi-client judgment |
US20150113340A1 (en) * | 2012-06-29 | 2015-04-23 | Huawei Technologies Co., Ltd. | Method and apparatus for implementing heartbeat service of high availability cluster |
CN105893176A (en) * | 2016-03-28 | 2016-08-24 | 杭州宏杉科技有限公司 | Management method and device of network storage system |
-
2016
- 2016-10-28 CN CN201610970085.5A patent/CN106452696A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102916825A (en) * | 2011-08-01 | 2013-02-06 | 中兴通讯股份有限公司 | Management equipment of dual-computer hot standby system, management method and dual-computer hot standby system |
US20150113340A1 (en) * | 2012-06-29 | 2015-04-23 | Huawei Technologies Co., Ltd. | Method and apparatus for implementing heartbeat service of high availability cluster |
CN103596652A (en) * | 2013-07-30 | 2014-02-19 | 华为技术有限公司 | Network control method and device |
CN103607310A (en) * | 2013-11-29 | 2014-02-26 | 华为技术有限公司 | Method for arbitration of remote disaster recovery |
CN103905247A (en) * | 2014-03-10 | 2014-07-02 | 北京交通大学 | Two-unit standby method and system based on multi-client judgment |
CN105893176A (en) * | 2016-03-28 | 2016-08-24 | 杭州宏杉科技有限公司 | Management method and device of network storage system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107342902A (en) * | 2017-07-14 | 2017-11-10 | 郑州云海信息技术有限公司 | A kind of link reconfiguration method and system of four controls server |
CN110529986A (en) * | 2019-09-25 | 2019-12-03 | 珠海格力电器股份有限公司 | Dual system communicates topological structure, dual system air conditioner |
CN110529986B (en) * | 2019-09-25 | 2020-09-15 | 珠海格力电器股份有限公司 | Dual-system communication topological structure and dual-system air conditioner |
CN114484766A (en) * | 2021-12-21 | 2022-05-13 | 珠海格力电器股份有限公司 | Method for determining master controller and related equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9319284B2 (en) | Operation delay monitoring method, operation management apparatus, and operation management program | |
CN102622279B (en) | Redundancy control system, method and Management Controller | |
US9838245B2 (en) | Systems and methods for improved fault tolerance in solicited information handling systems | |
JP2005209201A (en) | Node management in high-availability cluster | |
CN105681077A (en) | Fault processing method, device and system | |
CN106452696A (en) | Control system of server cluster | |
CN111628941A (en) | Network traffic classification processing method, device, equipment and medium | |
WO2015190934A1 (en) | Method and system for controlling well operations | |
CN112380089A (en) | Data center monitoring and early warning method and system | |
CN107729190A (en) | A kind of I/O path failure metastasis treating method and system | |
CN104125049A (en) | Redundancy implementation method of PCIE (Peripheral Component Interface Express) device based on BRICKLAND platform | |
CN102495786B (en) | Server system | |
CN106549865A (en) | Method, device and software defined network SDN controllers that service dynamic is recovered | |
CN103338240B (en) | The Cloud Server automatic monitored control system of monitoring automatic drift and method | |
KR20150124642A (en) | Communication failure recover method of parallel-connecte server system | |
CN104320285A (en) | Website running status monitoring method and device | |
CN112202613B (en) | Optical cable fault processing method, device, equipment and computer readable storage medium | |
CN108983695A (en) | A kind of master-slave switching method and device based on Complex Programmable Logic Devices | |
CN105530110A (en) | Network failure detection method and related network elements | |
CN103326880A (en) | Genesys calling system high-availability cloud computing system and method | |
US20150154130A1 (en) | Method for operating an automation device | |
WO2021077797A1 (en) | Quality of service measurement method and device, and user plane function | |
CN108021463A (en) | A kind of GPU failure management methods based on finite state machine | |
CN105045691B (en) | A kind of fault detection method and system | |
CN107896176A (en) | A kind of processing method of calculate node, intelligent terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170222 |
|
RJ01 | Rejection of invention patent application after publication |