CN104169905A - Configurable and fault-tolerant baseboard management controller arrangement - Google Patents

Configurable and fault-tolerant baseboard management controller arrangement Download PDF

Info

Publication number
CN104169905A
CN104169905A CN201280071730.XA CN201280071730A CN104169905A CN 104169905 A CN104169905 A CN 104169905A CN 201280071730 A CN201280071730 A CN 201280071730A CN 104169905 A CN104169905 A CN 104169905A
Authority
CN
China
Prior art keywords
bmc
controller
role
node
except
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201280071730.XA
Other languages
Chinese (zh)
Other versions
CN104169905B (en
Inventor
D·理查德森
B·肯尼迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to CN201711408176.0A priority Critical patent/CN107977299B/en
Publication of CN104169905A publication Critical patent/CN104169905A/en
Application granted granted Critical
Publication of CN104169905B publication Critical patent/CN104169905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • G06F11/2007Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant using redundant communication media
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)

Abstract

Systems and methods utilize a configurable and fault-tolerant baseboard management controller (BMC) arrangement in a multi-node system. In one example, the method may include designating a first BMC of the plurality of BMCs in a role of master BMC, determining that the first BMC can no longer serve the role of master BMC, and designating the BMC other than the first BMC to serve the role of the master BMC.

Description

Configurable and fault-tolerant baseboard management controller arrangement
Background
Technical field
Embodiment relates generally to the impact that alleviates the fault in multi-node server system.More specifically, embodiment relates to baseboard management controller (BMC) arrangement utilizing in multi node server.
Discuss
Server system can utilize the single frame that comprises a plurality of nodes.These server systems can utilize the rack management controller separated with node so that management of system resource intensively.Yet if rack management controller is out of order, the operation of whole system can be out of order.Therefore it can be useful, realizing configurable and fault-tolerant server system embodiment.
Therefore embodiment can provide the method for using configurable and fault-tolerant baseboard management controller (BMC) to arrange in multi-node system, comprises a plurality of BMC of detection, the BMC in the plurality of BMC is appointed as to the role of main BMC and the BMC except a BMC is appointed as to the role from BMC.The method also can comprise by the relevant BMC of a BMC transmission as the role's of main BMC information, determine that a BMC can no longer serve as the role of this main BMC and specify this BMC except a BMC to serve as the role of this main BMC.In addition, the method also can comprise by this BMC except a BMC and bears the role of this main BMC and this BMC except a BMC bears the role of main BMC with a relevant BMC as the role's of main BMC information.
In one example, the method can comprise that implementing time out period serves as the role of this main BMC so that this appointment this BMC except a BMC is opposed in permission.
In one example, the role of this main BMC comprises that at least one item in monitoring, management, support and the control aspect with respect to this multi-node system serves as central interface.
In another example, based on algorithm, determine following at least one: specify a BMC in the plurality of BMC to serve as the role of main BMC; And specify the BMC the BMC in the plurality of BMC to serve as the role from BMC.
In another example, this algorithm determine to be what at least one item in fastener components and software application carried out.
In another example, this algorithm is determined use identification number.
In one example, due to one in fault, physical removal and the indication of system component, a BMC no longer serves as the role of this main BMC.
In another example, at least one in a BMC and this BMC except a BMC remotely configured by network interface.
In one example, a BMC is configured at least one in monitoring, management, support and the control aspect of node.
In another example still, a BMC is configured at least one in monitoring, management, support and the control aspect of a plurality of nodes.
Embodiment also can comprise at least one machine readable media, comprises a plurality of instructions, and in response to being performed on computing equipment, the plurality of instruction causes this computing equipment to carry out any example of said method.Embodiment also can comprise a kind of device for utilizing the configurable and fault-tolerant baseboard management controller (BMC) of multi-node system to arrange, the BMC that comprises processing components, comprises the memory assembly of the first application and be configured for any example of implementing preceding method.Embodiment also can comprise a kind of system for utilizing the configurable and fault-tolerant baseboard management controller (BMC) of multi-node system to arrange, and comprising: frame, comprises a plurality of nodes and power supply; And node server, the BMC that comprises processing components, comprises the memory assembly of the first application and be configured for any example of implementing preceding method.
Another embodiment can provide a kind of method of utilizing the configurable and fault-tolerant baseboard management controller in multi-node system to arrange, comprise first controller of specifying in a plurality of controllers as the role of master controller and specify controller except this first controller as the role from controller, determine that this first controller no longer serves as the role of this master controller and the role who bears this master controller by this controller except this first controller.
In one example, the method can comprise by the transmission of this first controller to this first controller as the relevant information of the role of master controller.
In one example, the method can comprise that this controller of specifying except this first controller serves as the role of this master controller.
In another example still, the method can comprise that by this information that role that this controller utilization except this first controller serves as master controller to this first controller is relevant is to bear the role of this master controller.
In one example, the method can comprise and implements time out period to allow to oppose that this specifies this this controller except this first controller to serve as the role of this master controller.
In another example still, the role of this master controller comprises that at least one item in monitoring, management, support and the control aspect with respect to this multi-node system serves as central interface.
In another example still, based on algorithm, determine following at least one: specify the first controller in the plurality of controller to serve as the role of master controller; And specify the controller this first controller in the plurality of controller to serve as the role from controller.
In one example, this algorithm determine to be that at least one item in fastener components and software application carries out.
In another example still, this algorithm is determined and is used identification number.
In one example, due to one in fault, physical removal and the indication of system component, this first controller no longer serves as the role of this master controller.
In another example still, at least one in this first controller and this controller except this first controller remotely configured by network interface.
In one example, this first controller is configured at least one in monitoring, management, support and the control aspect of node.
In another example, this first controller is configured at least one in monitoring, management, support and the control aspect of a plurality of nodes.
Embodiment also can comprise at least one machine readable media, comprise a plurality of instructions for utilizing the configurable and fault-tolerant controller of multi-node system to arrange, in response to being performed on computing equipment, the plurality of instruction causes described computing equipment to carry out any example of said method.Embodiment also can comprise a kind of for utilizing the device of the configurable and fault-tolerant controller of multi-node system, comprising: processing components, the controller that comprises the memory assembly of the first application and be configured for any example of implementing preceding method.
Embodiment also can comprise a kind of system for utilizing the configurable and fault-tolerant controller of multi-node system to arrange, and comprising: frame, comprises a plurality of nodes and power supply; And node server, the controller that comprises processing components, comprises the memory assembly of the first application and be configured for any example of implementing preceding method.
Still another embodiment can comprise at least one computer-readable recording medium, comprises the instruction set for using the configurable and fault-tolerant baseboard management controller (BMC) of multi-node system to arrange.If be executed by processor, this instruction set causes a plurality of BMC of COMPUTER DETECTION, the BMC in the plurality of BMC is appointed as to the role of main BMC and the BMC except a BMC is appointed as to the role from BMC.If be performed, this instruction set also by the relevant BMC of a BMC transmission as the role's of main BMC information, determine that a BMC can no longer serve as the role of this main BMC and specify this BMC except a BMC to serve as the role of this main BMC.If be performed, this instruction set also by this BMC except a BMC, bears the role of this main BMC and this BMC except a BMC is used a relevant BMC as the role's of main BMC information, to bear the role of this main BMC.
Another embodiment can comprise a kind of device for using the configurable and fault-tolerant baseboard management controller (BMC) of multi-node system to arrange, and comprising: processing components, comprise the memory assembly of the first application and the BMC that comprises the computer-readable recording medium of include instruction collection.If be executed by processor, this instruction set causes a plurality of BMC of COMPUTER DETECTION, the BMC in the plurality of BMC is appointed as to the role of main BMC and the BMC except a BMC is appointed as from the role of BMC and transmits a relevant BMC as the information of main BMC by a BMC.If be performed, this instruction set also determines that a BMC can no longer serve as the role of this main BMC and the role that appointment this BMC except a BMC serves as this main BMC.If be performed, this instruction set also by this BMC except a BMC, bears the role of this main BMC and this BMC except a BMC is used a relevant BMC as the role's of main BMC information, to bear the role of this main BMC.
Still another embodiment also can comprise a kind of system for utilizing the configurable and fault-tolerant baseboard management controller (BMC) of multi-node system to arrange, and comprising: frame, comprises a plurality of nodes and power supply; And node server, comprise processing components, comprise memory assembly and the BMC of the first application.This BMC can comprise computer-readable recording medium, comprise instruction set, if be executed by processor, this instruction set causes a plurality of BMC of COMPUTER DETECTION, the BMC in the plurality of BMC is appointed as to the role of main BMC and the BMC except a BMC is appointed as to the role from BMC.If be performed, this instruction set also by the relevant BMC of a BMC transmission as the role's of main BMC information, determine that a BMC can no longer serve as the role of this main BMC and specify this BMC except a BMC to serve as the role of this main BMC.If be performed, this instruction set also by this BMC except a BMC, bears the role of main BMC and this BMC except a BMC is used a relevant BMC as the role's of main BMC information, to bear the role of main BMC.
To obtaining those of ordinary skills of benefit of the present disclosure, by being apparent that, can to these embodiment, make various modifications and variations in the situation that do not depart from more broader spirit and the scope of embodiment described here.Therefore, will treat instructions and accompanying drawing with illustrative rather than restrictive meaning.
Those of ordinary skills can realize a large amount of technology of recognizing the embodiment of the present invention from above stated specification by different forms.Therefore, although described the embodiment of the present invention in conjunction with its concrete example, the true scope of this aspect embodiment should not be so limited, because when study accompanying drawing, instructions and following claims, other modifications will become obvious to those of ordinary skills.
Additionally, in some accompanying drawing, available line represents signal conductor.Some circuit may, more slightly for the more composition signal paths of indication, have number label and with indication, form the quantity of signal path, and/or in one or more ends, have arrow to indicate main directions of information flow.Yet this should not explain in restrictive mode.But this additional detail can be used in conjunction with one or more exemplary embodiments, to promote, more easily understand.In fact any represented signal wire (no matter whether having additional information) can comprise one or more signals, these one or more signals can be propagated in multiple directions and the signaling plan of available any type is realized, numeral or the analog line for example with differential pair, fibre circuit and/or single ended line, realized.
Provided example sizes/models/values/ranges, although the embodiment of the present invention is not limited to this.For example, along with manufacturing technology (, photoetching process) is increasingly mature, desired is to manufacture the equipment with less size.Additionally, in order to show simply and to discuss, and in order not obscure some aspect of the embodiment of the present invention, can or can not show in the accompanying drawings that known electricity/ground connects and other assemblies.Further, can arrangement be shown by the form of block diagram, to avoid confusion the embodiment of the present invention, and in view of the following fact: the details for the implementation of this block diagram arrangement depend on to heavens and will realize therein the platform of embodiment, that is, these details should be in those of ordinary skills' scope.When listing specific detail, so that while describing example embodiment of the present invention, should be apparent that for those of ordinary skills the embodiment of the present invention can be in the situation that do not have or have the variant of these specific detail and put into practice.Therefore, this instructions should be considered to displaying property rather than restrictive.
Term " coupling " can be used herein to and refers to the relation (directly or indirectly) of any type between relevant assembly and may be used on electricity, machinery, fluid, optical, electrical magnetic, electromechanics or other connections.Additionally, term " first ", " second " etc. can only be discussed and without the meaning of any specific time or time sequencing, except as otherwise noted for convenient at this.
At length with reference to specific embodiment only by way of example but not by restriction, illustrated and described some features and the aspect of embodiments of the invention.One of skill in the art will recognize that the alternative implementation of the disclosed embodiments and various being modified in the scope of the present disclosure and imagination.Therefore, be intended to think that the present invention is only limited by the scope of appended claims.
Brief Description Of Drawings
By reading following instructions and appended claims and by reference to the following drawings, the various advantages of the embodiment of the present invention will become obviously to those of ordinary skills, in the accompanying drawings:
Fig. 1 is the block diagram of realizing according to an embodiment of the invention the example of the computing system that configurable and fault-tolerant baseboard management controller arranges; And
Fig. 2 is the process flow diagram that utilizes according to an embodiment of the invention the example of the method that configurable and fault-tolerant baseboard management controller arranges.
Describe in detail
Turn to now Fig. 1, the block diagram of the computing system 10 that utilizes configurable and fault-tolerant baseboard management controller arrangement is shown.Computing system 10 can comprise (except other business) frame 100, first node server 200, Section Point server 300, the 3rd node server 400, I/O (I/O) extender 600.Computing system 10 can be coupled to network 1100.
Frame 100 can comprise first node 101, Section Point 102 and the 3rd node 103.Node 101,102,103 can be any replaceable unit that comprises one or more assemblies.The example of these assemblies comprises (except other business) hard disk drive, substrate, side plate or buttcover plate.
Frame 100 also can comprise the first fan 105, the second fan 106 and three fan 107.Fan 105,106,107 can be used for making the assembly in frame 100 cooling.Frame also can comprise power supply 104.Power supply 104 can be used for for each assembly power supply in frame 100.As used in this, fan 105,106,107 and power supply 104 can be the examples of system resource.Other system resource can comprise hard disk drive, sensor, hard disk drive and storage backboard.
First node server 200 can be the computer server system of many aspects that is configured for monitoring, management, supports and controls the operation of first node 101.Similarly, Section Point server 300 and the 3rd node server 400 can be configured for respectively the many aspects of the operation of monitoring, manage, support and control Section Point 102 and the 3rd node 103.
First node server 200 can comprise first node memory assembly 201, first node fastener components 202 and first node processing components 203.Similarly, Section Point server 300 can comprise Section Point memory assembly 301, Section Point fastener components 302 and Section Point processing components 303.Equally, the 3rd node server 400 can comprise the 3rd node memory assembly 401, the 3rd node firmware assembly 402 and the 3rd node processing assembly 403.
First node memory assembly 201 can comprise first node server application 204, and it can be configured for (except other business) monitoring, management, support and control the many aspects of the operation of first node 101.Similarly, Section Point memory assembly 301 can comprise Section Point server application 304, can utilize similarly this Section Point server application with respect to Section Point 102.Equally, the 3rd node memory assembly 401 can comprise the 3rd node server application 404, can utilize similarly the 3rd node server application with respect to the 3rd node 103.
First node server 200 can comprise first node BMC 205, and it can be configured for monitoring, management, support and control the many aspects of the operation of multi-node system.In the present embodiment, except other business, it can be configured for monitoring, management, support and control the many aspects of the operation of its node being associated (first node 101).
For example, first node BMC 205 can be configured for (except other business) and transmit the information (for example, power level, temperature reading and voltage level information) relevant to the operation of first node 101.Additionally, the interface that first node BMC 205 for example also can be configured for, between any entity (, the application of first node server 204) that promotes first node 101 and be configured for the operation of monitoring, management, support and control first node 101 connects.Section Point BMC 305 can be configured for respect to Section Point 102 and operate similarly.Equally, the 3rd Node B MC 405 can be configured for respect to the 3rd node 103 and operate similarly.
First node BMC 205 can comprise first node BMC fastener components 206.Similarly, Section Point BMC 305 can comprise Section Point BMC fastener components 306.Equally, the 3rd Node B MC can comprise the 3rd Node B MC memory assembly 406, and wherein, the 3rd Node B MC memory assembly 406 can comprise the 3rd Node B MC software application 407.First node BMC fastener components 206, Section Point BMC fastener components 306 and the 3rd Node B MC software application 407 can be configured for (except other business) by power supply being directed to frame 100 via power management bus 500 to power supply 104 transfer instructions.
I/O extender 600 can be that (except other business) can allow for example, whether to have node (for example, first node 101) in BMC (, first node BMC 205) detection computations system 10.I/O extender 600 can be coupled to first node BMC 205, Section Point BMC 305 and the 3rd Node B MC 405 by BMC bus 700 between node.
Network 1100 can be coupled to respectively first node BMC 205, Section Point BMC 305 and the 3rd Node B MC 405 by first network interface 800, second network interface 900 and the 3rd network interface 1000.These network interfaces can be used for each assembly that (except other business) remotely configures computing system 10.
In an embodiment of the present invention, any coupling BMC can bear the role of " master " BMC of multi-node system.Once BMC has been designated as master, specified main BMC can serve as intrasystem all switching nodes main BMC of (comprising the node that it is associated).In other words, main BMC can serve as central interface with respect to the operation of multi-node system.When having specified main BMC, any other BMC in multi-node system can bear " from " role.
The many aspects that the example of the many aspects of the operation of the node that can be realized by main BMC can include but not limited to supervisory system assembly (for example, temperature, power), management (for example, to the relevant relevant data of system component transmission) system component, support are (for example, obtain and install firmware and software upgrading) and control (for example, the configuration of guidance system resource) system component.By single main BMC, realizing these aspects can (except other business) reduce system congestion (for example, the portfolio on communication bus) and avoid redundancy (for example,, when mounting software upgrades).
In an embodiment of the present invention, algorithm can be configured for and specify main BMC.Once can especially determining, this algorithm will initially specify which BMC may no longer availablely just which BMC to be specified as new main as master or current main BMC.Current main BMC may be no longer available for various reasons, a variety of causes comprise system component fault, remove (for example, physical removal) or indication.In fact, as will be described in more detail, system component such as fastener components (for example, first node BMC fastener components 206) or software application (for example, the 3rd Node B MC software application 407) can initially carry out main BMC, from current main BMC, remove major state and specify new main BMC etc.
In certain embodiments, algorithm can be specified main BMC based on node identification number.For example, a this algorithm can be specified main BMC based on minimum identification number.Like this, in the embodiment describing in Fig. 1, first this algorithm can be appointed as main BMC by first node BMC 206, is then Section Point BMC 207, by that analogy.
In an embodiment of the present invention, main BMC can be specified and be configured by various devices.For example, main BMC can for example, by BMC fastener components (, first node fastener components 202) or apply (for example, the application of first node server 204) by executive software and specifies and configure.
In other embodiments, can be by using application configuration BMC by the attached host interface of any switching node.The example of this application can be that utility routine is set, as Basic Input or Output System (BIOS) (BIOS).
Also can be by long-range connection configuration BMC.For example, network (such as network 1100) can be used network interface (for example, first network interface 800) to come remotely (for example,, by Ethernet, LAN (Local Area Network) (LAN) etc.) to specify and configure main BMC.
Main BMC for example can transmit relevant main BMC, to the proprietorial information of major state (, configuration information, configuration information), to promote the role of main BMC to be transferred to another BMC, if necessary.Like this, for example, if first node BMC 205 is designated as master, it can be periodically given communication for example, from BMC (, Section Point BMC 305, the 3rd Node B MC 405), so as to promote (future) by main role be transferred to from.
Thereby the arrangement of the frame of describing in Fig. 1 and numbering can not be intended to imply sequence of operation and get rid of other possibilities.One of skill in the art will recognize that and can carry out various modifications and change to system and method.
For example, in the embodiment describing in Fig. 1, a BMC (for example, BMC 206) is can be mainly attached a node (for example, node 101).This is without being this situation.Other embodiment of the present invention can allow single BMC monitoring, management, support and control more than one node.
Turn to now Fig. 2, show the process flow diagram of the illustrative methods of the configurable and fault-tolerant baseboard management controller arrangement of use according to an embodiment of the invention.The method can be implemented as to be stored in uses circuit engineering (such as special IC (ASIC), complementary metal oxide semiconductor (CMOS) (CMOS) or transistor-transistor logic (TTL) technology or its combination in any) fixed function hardware in configurable logic (such as programmable logic array (PLA), field programmable gate array (FPGA), complex programmable logic equipment (CPLD), ) in machine or computer-readable recording medium (such as random access memory (RAM), ROM (read-only memory) (ROM), programming ROM (PROM), firmware, flash memory etc.) the logical order collection in.For example, the combination in any of available one or more programming languages is write for carrying out the computer program code of the operation shown in the method, comprises OO programming language, such as C++ etc., and conventional program programming language, such as " C " programming language or similar programming language.
In the present embodiment, at Section Point BMC (such as Section Point BMC 305 (Fig. 1)) and the 3rd Node B MC (such as the 3rd Node B MC 405 (Fig. 1)) afterwards, first node BMC (such as first node BMC 205 (Fig. 1)) can reach the standard grade.First node BMC can have identification number 1, and Section Point can have identification number 2, by that analogy.
The method can start in processing block 2000.In processing block 2010, Section Point BMC and the 3rd Node B MC can reach the standard grade.In processing block 2020, Section Point BMC and the 3rd Node B MC can pass through I/O extender (such as I/O extender 600 (Fig. 1)) and detect existence each other.In processing block 2030, the algorithm of being realized by the BMC fastener components (such as Section Point BMC fastener components 306 (Fig. 1)) of Section Point BMC can should be born the role of main BMC state based on lowest section point identification number definite Section Point BMC.Having more the 3rd Node B MC of high node identification number can be used as from BMC and moves.
In processing block 2040, first node BMC can reach the standard grade.In processing block 2050, the BMC fastener components of first node BMC (such as first node BMC fastener components 206 (Fig. 1)) can determine that it should state main BMC state based on minimum identification number (that is, 1).In processing block 2060, first node BMC can message transfer to notify it can bear the role of main BMC.
In processing block 2070, still the Section Point BMC with the operation of leading role's look can indicate it to abandon the role's of main BMC response to first node BMC transmission.This message for example also can comprise with Section Point BMC, as the relevant information of the role of main BMC (, system resource configuration, system status information etc.).In processing block 2080, thereby can starting time out period, first node BMC allow any system component to oppose that it bears the role of main BMC.
In processing block 2090, after time out period is gone over without any opposition in the situation that, first node BMC can bear from Section Point BMC the role of main BMC.In processing block 2100, first node BMC can complete to indicate the role that it bears main BMC by transport communication (for example, system status information) in time out period.In processing block 2110, first node BMC for example can periodically transmit relevant first node BMC, to the proprietorial information of major state (, configuration information, configuration information), to promote role who shifts main BMC etc., if necessary.
In processing block 2120, first node BMC may be out of order (for example,, due to firmware failure).In processing block 2130, after first node BMC can not send out message in time out period, at least one in other switching nodes can determine that current main BMC (that is, first node BMC) may no longer work, and determines which node should be new main BMC.In this case, the BMC fastener components of Section Point BMC can determine that Section Point BMC should bear leading role's look (that is, based on minimum identification number).
In processing block 2140, Section Point BMC can state main BMC state.In processing block 2150, the proprietorial information of the addressable relevant first node BMC mainly being transmitted by first node BMC of Section Point BMC to major state, to promote it to bear the role of main BMC.In processing block 2160, Section Point BMC can bear the role of main BMC.In processing block 2170, this process can finish.
Thereby the order of the processing block of describing in Fig. 2 and numbering are not intended to imply sequence of operation and get rid of other possibilities.One of skill in the art will recognize that and can carry out various modifications and change to system and method.
For example, in the above-described embodiments, algorithm can be realized BMC fastener components based on the bright main BMC state of lowest section point identification bugle call (that is, processing block 2020).Yet, this situation not necessarily.In other embodiments, the BMC that has a lower identification number can only be designated as main BMC after current main BMC is out of order.
Equally, in the above-described embodiments, Section Point BMC can to first node BMC transmission indication, it abandons the role's of main BMC response (that is, processing block 2070).Yet, this situation not necessarily.In other embodiments, before sending this message, first Section Point BMC can indicate its " busy " (for example,, in the middle of power supply upgrades).If Section Point BMC indicates its possibility busy, request first node BMC can periodically resend the request that it bears the role of main BMC, until its success.

Claims (29)

1. a method of utilizing the configurable and fault-tolerant baseboard management controller (BMC) in multi-node system to arrange, comprises
Detect a plurality of BMC;
A BMC in described a plurality of BMC is appointed as to the role of main BMC, and the BMC except a described BMC is appointed as to the role from BMC;
By a described BMC transmission to as the relevant information of a described BMC role of main BMC;
Determine that a described BMC no longer serves as the role of described main BMC;
The described BMC of appointment except a described BMC serves as the role of described main BMC;
By the described BMC except a described BMC, born the role of described main BMC;
By the described BMC except a described BMC utilize to a described BMC as the relevant described information of the role of main BMC to bear the role of described main BMC.
2. the method for claim 1, comprises and implements time out period to allow to oppose that the described BMC of described appointment except a described BMC serves as the role of described main BMC.
3. the method for claim 1, wherein the role of described main BMC comprises that at least one item in monitoring, management, support and the control aspect with respect to described multi-node system serves as central interface.
4. the method for claim 1, wherein based on algorithm, determine following at least one: the BMC in the described a plurality of BMC of described appointment serves as the role of main BMC; And the BMC the described BMC of described appointment in described a plurality of BMC serves as the role from BMC.
5. method as claimed in claim 4, wherein, described algorithm determines that at least one item of being in fastener components and software application carries out.
6. method as claimed in claim 4, wherein, described algorithm is determined and is utilized identification number.
7. the method for claim 1, wherein due to one in fault, physical removal and the indication of system component, a described BMC no longer serves as the role of described main BMC.
8. the method for claim 1, wherein at least one in a described BMC and the described BMC except a described BMC remotely configured by network interface.
9. the method for claim 1, wherein a described BMC is configured at least one in monitoring, management, support and the control aspect of node.
10. the method for claim 1, wherein a described BMC is configured at least one in monitoring, management, support and the control aspect of a plurality of nodes.
11. comprise at least one machine readable method of a plurality of instructions of configurable and fault-tolerant baseboard management controller (BMC) arrangement utilizing in multi-node system, in response to being performed on computing equipment, described instruction causes described computing equipment to be carried out according to the method described in any one in claim 1 to 10.
12. 1 kinds of devices that utilize the configurable and fault-tolerant baseboard management controller (BMC) in multi-node system to arrange, comprise
Processing components;
Memory assembly, comprises the first application; And
BMC, is configured for the method described in any one in 1 to 10 that executes claims.
13. 1 kinds of systems of utilizing the configurable and fault-tolerant baseboard management controller (BMC) in multi-node system to arrange, comprising:
Frame, comprises a plurality of nodes and power supply; And
Node server, comprising:
Processing components;
Memory assembly, comprises the first application; And
BMC, is configured for the method described in any one in 1 to 10 that executes claims.
14. 1 kinds of methods of utilizing the configurable and fault-tolerant baseboard management controller (BMC) in multi-node system to arrange, comprising:
The first controller in a plurality of controllers is appointed as to the role of master controller, and the controller except described the first controller is appointed as to the role from controller;
Determine that described the first controller no longer serves as the role of described master controller; And
By the described controller except described the first controller, born the role of described master controller.
15. methods as claimed in claim 14, further comprise by described the first controller and transmitting to described the first controller as the relevant information of the role of master controller.
16. methods as claimed in claim 14, further comprise that the described controller of specifying except the first controller serves as the role of described master controller.
17. methods as claimed in claim 14, further comprise and by the described controller utilization except described the first controller, are born as the relevant information of the role of master controller the role of described master controller to described the first controller.
18. methods as claimed in claim 16, comprise and implement time out period to allow to oppose that the described controller except described the first controller described in described appointment serves as the role of described master controller.
19. methods as claimed in claim 14, wherein, the role of described master controller comprises that at least one item in monitoring, management, support and the control aspect with respect to described multi-node system serves as central interface.
20. methods as claimed in claim 16, wherein, determine following at least one based on algorithm: the first controller in the described a plurality of controllers of described appointment serves as the role of master controller; And the controller described first controller of described appointment in described a plurality of controllers serves as the role from controller.
21. methods as claimed in claim 20, wherein, described algorithm determine to be what at least one item in fastener components and software application carried out.
22. methods as claimed in claim 20, wherein, the definite identification number that utilizes of described algorithm.
23. methods as claimed in claim 14, wherein, due to one in fault, physical removal and the indication of system component, described the first controller no longer serves as the role of described master controller.
24. methods as claimed in claim 14, wherein, at least one in described the first controller and the described controller except described the first controller remotely configured by network interface.
25. methods as claimed in claim 14, wherein, described the first controller is configured at least one in monitoring, management, support and the control aspect of node.
26. methods as claimed in claim 14, wherein, described the first controller is configured at least one in monitoring, management, support and the control aspect of a plurality of nodes.
27. comprise at least one machine readable method of a plurality of instructions that utilize the configurable and fault-tolerant controller arrangement in multi-node system, in response to being performed on computing equipment, described instruction causes described computing equipment to be carried out according to claim 14 to the method described in any one in 26.
28. 1 kinds of devices that utilize the configurable and fault-tolerant controller in multi-node system, comprising:
Processing components;
Memory assembly, comprises the first application; And
Controller, is configured for the method described in any one in 14 to 26 that executes claims.
29. 1 kinds of systems of utilizing the configurable and fault-tolerant controller in multi-node system to arrange, comprising:
Frame, comprises a plurality of nodes and power supply; And
Node server, comprising:
Processing components;
Memory assembly, comprises the first application; And
Controller, is configured for the method described in any one in 14 to 26 that executes claims.
CN201280071730.XA 2012-03-28 2012-03-28 Utilize the methods, devices and systems of configurable and fault-tolerant baseboard management controller arrangement Active CN104169905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711408176.0A CN107977299B (en) 2012-03-28 2012-03-28 Method and system for baseboard management controller arrangement using configurable and fault tolerant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2012/030958 WO2013147767A1 (en) 2012-03-28 2012-03-28 Configurable and fault-tolerant baseboard management controller arrangement

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201711408176.0A Division CN107977299B (en) 2012-03-28 2012-03-28 Method and system for baseboard management controller arrangement using configurable and fault tolerant

Publications (2)

Publication Number Publication Date
CN104169905A true CN104169905A (en) 2014-11-26
CN104169905B CN104169905B (en) 2019-06-11

Family

ID=49260833

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201280071730.XA Active CN104169905B (en) 2012-03-28 2012-03-28 Utilize the methods, devices and systems of configurable and fault-tolerant baseboard management controller arrangement
CN201711408176.0A Active CN107977299B (en) 2012-03-28 2012-03-28 Method and system for baseboard management controller arrangement using configurable and fault tolerant

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201711408176.0A Active CN107977299B (en) 2012-03-28 2012-03-28 Method and system for baseboard management controller arrangement using configurable and fault tolerant

Country Status (4)

Country Link
US (1) US9772912B2 (en)
CN (2) CN104169905B (en)
DE (1) DE112012006150T5 (en)
WO (1) WO2013147767A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107769960A (en) * 2017-09-07 2018-03-06 郑州云海信息技术有限公司 A kind of BMC management frameworks based on CAN
CN111984471A (en) * 2020-08-14 2020-11-24 苏州浪潮智能科技有限公司 Cabinet power BMC redundancy management system and method
CN113886307A (en) * 2021-09-30 2022-01-04 阿里巴巴(中国)有限公司 Thermal maintenance method and system for BMC module, server mainboard and BMC module

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9772912B2 (en) 2012-03-28 2017-09-26 Intel Corporation Configurable and fault-tolerant baseboard management controller arrangement
JP6303405B2 (en) * 2013-11-01 2018-04-04 富士通株式会社 Information processing apparatus, management apparatus, monitoring apparatus, monitoring program, and management apparatus monitoring method
CN104679635A (en) * 2013-11-29 2015-06-03 鸿富锦精密电子(天津)有限公司 Server monitoring circuit
US9998359B2 (en) 2013-12-18 2018-06-12 Mellanox Technologies, Ltd. Simultaneous operation of remote management and link aggregation
US9619243B2 (en) * 2013-12-19 2017-04-11 American Megatrends, Inc. Synchronous BMC configuration and operation within cluster of BMC
US10148746B2 (en) 2014-01-28 2018-12-04 Mellanox Technologies, Ltd. Multi-host network interface controller with host management
US9804937B2 (en) * 2014-09-08 2017-10-31 Quanta Computer Inc. Backup backplane management control in a server rack system
US9985820B2 (en) 2015-02-22 2018-05-29 Mellanox Technologies, Ltd. Differentiating among multiple management control instances using addresses
US9729440B2 (en) 2015-02-22 2017-08-08 Mellanox Technologies, Ltd. Differentiating among multiple management control instances using IP addresses
US10157115B2 (en) * 2015-09-23 2018-12-18 Cloud Network Technology Singapore Pte. Ltd. Detection system and method for baseboard management controller
WO2017123220A1 (en) * 2016-01-13 2017-07-20 Hewlett Packard Enterprise Development Lp Serial bootloading of power supplies
US10303568B2 (en) * 2017-02-10 2019-05-28 Dell Products L.P. Systems and methods for high availability of management controllers
US10827005B2 (en) * 2017-03-01 2020-11-03 Dell Products L.P. Systems and methods of group automation for multi-chassis management
US10979497B2 (en) * 2018-07-19 2021-04-13 Cisco Technology, Inc. Multi-node discovery and master election process for chassis management
US11012306B2 (en) * 2018-09-21 2021-05-18 Cisco Technology, Inc. Autonomous datacenter management plane
US10896142B2 (en) 2019-03-29 2021-01-19 Intel Corporation Non-volatile memory out-of-band management interface for all host processor power states
US11853771B1 (en) 2019-09-24 2023-12-26 Amazon Technologies, Inc. Offload card based virtualization of a pre-assembled computer system integrated into a server for a virtualization service
US11113046B1 (en) * 2019-09-24 2021-09-07 Amazon Technologies, Inc. Integration and remote control of a pre-assembled computer system into a server for a virtualization service
US11424997B2 (en) * 2019-12-10 2022-08-23 Dell Products L.P. Secured network management domain access system
CN113835770B (en) * 2021-11-30 2022-02-18 四川华鲲振宇智能科技有限责任公司 Online replacement method and system for server management module

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6889248B1 (en) * 2000-04-12 2005-05-03 Sun Microsystems, Inc. Automatically configuring a server into a master or slave server based on its relative position in a server network
CN1773461A (en) * 2004-11-12 2006-05-17 国际商业机器公司 Method and system for handling a fabric device failure
CN101324877A (en) * 2007-06-14 2008-12-17 国际商业机器公司 System and manufacture method of multi-node configuration of processor cards connected via processor fabrics
CN101663650A (en) * 2007-04-20 2010-03-03 国际商业机器公司 The equipment, the system and method that are used for adapter card failover
US20110010584A1 (en) * 2009-07-07 2011-01-13 International Business Machines Corporation Diagnosis of and Response to Failure at Reset in a Data Processing System
US20110153798A1 (en) * 2009-12-22 2011-06-23 Groenendaal Johan Van De Method and apparatus for providing a remotely managed expandable computer system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6625750B1 (en) 1999-11-16 2003-09-23 Emc Corporation Hardware and software failover services for a file server
US7058703B2 (en) * 2002-03-08 2006-06-06 Intel Corporation System management controller (SMC) negotiation protocol for determining the operational mode of SMCs
CN1327341C (en) * 2004-01-13 2007-07-18 英业达股份有限公司 Firmware automatic configuration system and method for substrate management controller
US8868790B2 (en) * 2004-02-13 2014-10-21 Oracle International Corporation Processor-memory module performance acceleration in fabric-backplane enterprise servers
TWI261751B (en) 2005-06-13 2006-09-11 Quanta Comp Inc Mis-configuration detection methods and devices for blade systems
US8023434B2 (en) * 2007-09-18 2011-09-20 International Business Machines Corporation Arrangements for auto-merging and auto-partitioning processing components
JP4659062B2 (en) 2008-04-23 2011-03-30 株式会社日立製作所 Failover method, program, management server, and failover system
CN102201959A (en) * 2010-03-26 2011-09-28 英业达股份有限公司 Network interface system of substrate management controller
US9772912B2 (en) 2012-03-28 2017-09-26 Intel Corporation Configurable and fault-tolerant baseboard management controller arrangement

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6889248B1 (en) * 2000-04-12 2005-05-03 Sun Microsystems, Inc. Automatically configuring a server into a master or slave server based on its relative position in a server network
CN1773461A (en) * 2004-11-12 2006-05-17 国际商业机器公司 Method and system for handling a fabric device failure
CN101663650A (en) * 2007-04-20 2010-03-03 国际商业机器公司 The equipment, the system and method that are used for adapter card failover
CN101324877A (en) * 2007-06-14 2008-12-17 国际商业机器公司 System and manufacture method of multi-node configuration of processor cards connected via processor fabrics
US20110010584A1 (en) * 2009-07-07 2011-01-13 International Business Machines Corporation Diagnosis of and Response to Failure at Reset in a Data Processing System
US20110153798A1 (en) * 2009-12-22 2011-06-23 Groenendaal Johan Van De Method and apparatus for providing a remotely managed expandable computer system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107769960A (en) * 2017-09-07 2018-03-06 郑州云海信息技术有限公司 A kind of BMC management frameworks based on CAN
CN107769960B (en) * 2017-09-07 2020-11-27 苏州浪潮智能科技有限公司 BMC management architecture based on CAN bus
CN111984471A (en) * 2020-08-14 2020-11-24 苏州浪潮智能科技有限公司 Cabinet power BMC redundancy management system and method
CN111984471B (en) * 2020-08-14 2022-11-25 苏州浪潮智能科技有限公司 Cabinet power BMC redundancy management system and method
CN113886307A (en) * 2021-09-30 2022-01-04 阿里巴巴(中国)有限公司 Thermal maintenance method and system for BMC module, server mainboard and BMC module

Also Published As

Publication number Publication date
WO2013147767A1 (en) 2013-10-03
CN104169905B (en) 2019-06-11
US9772912B2 (en) 2017-09-26
CN107977299B (en) 2022-01-25
CN107977299A (en) 2018-05-01
US20140229758A1 (en) 2014-08-14
DE112012006150T5 (en) 2015-01-08

Similar Documents

Publication Publication Date Title
CN104169905A (en) Configurable and fault-tolerant baseboard management controller arrangement
US9965367B2 (en) Automatic hardware recovery system
TWI618380B (en) Management methods, service controller devices and non-stransitory, computer-readable media
US10171252B2 (en) Data determination apparatus, data determination method, and computer readable medium
CN105721357A (en) Exchange device, and peripheral component interconnection express (PCIe) system and initialization method thereof
US9143338B2 (en) Position discovery by detecting irregularities in a network topology
US8397053B2 (en) Multi-motherboard server system
CN103995575A (en) Server starting method and server
CN106155970B (en) automatic hardware recovery method and automatic hardware recovery system
CN103649923B (en) A kind of NUMA Installed System Memory mirror configuration method, release method, system and host node
CN105807722A (en) Numerical control system including internal register self-reset function with serial communication signal monitoring
CN102289402A (en) Monitoring and managing method based on physical multi-partition computer architecture
CN109062753A (en) A kind of hard disk monitoring system and monitoring method
CN109388526A (en) A kind of control circuit and the method for resetting operation
US20170024353A1 (en) Dedicated lan interface per ipmi instance on a multiple baseboard management controller (bmc) system with single physical network interface
CN103186440B (en) Detect subcard method, apparatus and system in place
CN106852188A (en) Data interactive method and universal serial bus device based on USB
CN102253845B (en) Server system
CN105009086B (en) A kind of method, computer and switching device for realizing processor switching
US20160156518A1 (en) Server for automatically switching sharing-network
CN106528320B (en) Computer system
CN102147640A (en) Server with a plurality of main boards
CN103092735A (en) Method for updating node states
WO2017069859A1 (en) Universal controller to support remote monitoring of system and/or machine health
CN103186435A (en) System error treatment method and server system applying same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant