WO2022078519A1 - 一种计算机设备和管理方法 - Google Patents

一种计算机设备和管理方法 Download PDF

Info

Publication number
WO2022078519A1
WO2022078519A1 PCT/CN2021/124249 CN2021124249W WO2022078519A1 WO 2022078519 A1 WO2022078519 A1 WO 2022078519A1 CN 2021124249 W CN2021124249 W CN 2021124249W WO 2022078519 A1 WO2022078519 A1 WO 2022078519A1
Authority
WO
WIPO (PCT)
Prior art keywords
state
service
bmc
service node
node
Prior art date
Application number
PCT/CN2021/124249
Other languages
English (en)
French (fr)
Inventor
刘兴森
宋铜铃
牛元君
李安
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21879560.7A priority Critical patent/EP4213017A4/en
Publication of WO2022078519A1 publication Critical patent/WO2022078519A1/zh
Priority to US18/298,739 priority patent/US20230244550A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/461Saving or restoring of program or task context
    • G06F9/463Program control block organisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/177Initialisation or configuration control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/52Indexing scheme relating to G06F9/52
    • G06F2209/522Manager

Definitions

  • the present invention relates to the field of computers, and in particular, to a computer device and a management method.
  • ATC Advanced Telecommunications Computing Architecture
  • the present application provides a computer equipment management method and system, a technical solution for managing computer equipment by using business nodes instead of management nodes, and using business nodes instead of management nodes to manage computer equipment, which can effectively improve the integration of computer equipment. , which saves the development work of the management node.
  • the present application provides a computer device, the computer device includes: a common component of a chassis and a plurality of service nodes, each service node includes: a baseboard management controller (Baseboard Management Controller, BMC), the BMC and the chassis Common parts are connected.
  • BMC Baseboard Management Controller
  • the service node can directly manage and operate the common components of the chassis, eliminating the need for independent management nodes, greatly improving the space utilization of computer equipment, and eliminating the need for independent development and management nodes. generated work.
  • the BMC of any one of the multiple service nodes is used to manage the common components of the chassis when in the master state. Only the BMC of the service node in the active state has the right to manage the common parts of the chassis, which can effectively prevent the operation conflicts that may occur when multiple service nodes manage the common parts of the chassis at the same time.
  • the BMC of any one of the multiple service nodes is also used to manage the multiple service nodes when in the master state.
  • the service node in the main state can also manage all the service nodes, to ensure that the service node in the main state can manage the entire computer equipment and realize the management of the entire computer equipment.
  • the BMC of each service node is also used to run the first management sub-module and the second management sub-module, and the first management sub-module is used to manage and run the service node; the state of the second management sub-module Including: working state and standby state; when the second management sub-module is in the working state, the second management sub-module is used to manage the common components of the chassis and other service nodes except this service node.
  • the BMC of the service node when the running second management sub-module is in the working state, the BMC of the service node is in the master state; when the running second management sub-module is in the standby state, the BMC of the service node is in the slave state. state.
  • the BMCs of any two service nodes in the plurality of service nodes are connected, and any one service node in the plurality of services is used to be accessed by users to manage computer equipment.
  • the technical solution can ensure that when the state of the service node changes, the user can manage the computer equipment without having to go to the site to re-plug and unplug the cable to the interface of the new service node in the main state.
  • each service node further includes a logic circuit, the logic circuits of any two service nodes among the multiple service nodes are connected, and the logic circuit is used to obtain the status information of the multiple service nodes.
  • the logic circuit can quickly process the electrical signals related to the state, obtain the state information, and continuously refresh the state information.
  • the logic circuits of any two service nodes are connected to ensure the rapid synchronization of the state information of any two service nodes.
  • the logic circuit includes: a complex programmable logic device (CPLD), a local area network switching device or a controller area network (Controller Area Network, CAN) circuit.
  • CPLD complex programmable logic device
  • CAN Controller Area Network
  • the manner in which the logic circuits of two service nodes among the multiple service nodes are connected includes: full interconnection, or bus interconnection.
  • the multiple service nodes are also used to select a BMC of a service node to enter the master state according to the master selection principle when no BMC of the multiple service nodes is in the master state.
  • the BMC of a service node is quickly selected to enter the master state, and the computer equipment is managed to prevent the computer equipment from working abnormally due to lack of management operations.
  • multiple service nodes are also used to select the master according to the principle of selecting the master when the service node to which the BMC in the master state belongs is abnormal, or when the BMC application state of the service node in the master state is switched. , and the BMC of a service node is selected to enter the main state.
  • the present application provides a method for managing computer equipment.
  • the computer equipment includes: a common component of a chassis and a plurality of service nodes, each service node includes: a mainboard management controller BMC, and the BMC is connected to the common component of the chassis .
  • the management method includes: when in the main state, the BMC of any one of the multiple service nodes manages the common components of the chassis.
  • the service node can directly manage and operate the common components of the chassis, eliminating the need for independent management nodes, greatly improving the space utilization of computer equipment, and eliminating the need for independent development and management nodes. generated work. Only the BMC of the service node in the active state has the right to manage the common parts of the chassis, which can effectively prevent the operation conflicts that may occur when multiple service nodes manage the common parts of the chassis at the same time.
  • the BMC of any one of the multiple service nodes when in the master state, also manages multiple service nodes.
  • the service node in the main state can also manage all the service nodes, to ensure that the service node in the main state can manage the entire computer equipment and realize the management of the entire computer equipment.
  • the BMC of each service node also runs a first management sub-module and a second management sub-module, the first management sub-module manages the service node; the state of the second management sub-module includes: working state and Standby state; when the second management sub-module is in the working state, the second management sub-module manages the common components of the chassis and other service nodes except the service node.
  • the BMC of the service node when the second management sub-module is in the working state, the BMC of the service node is in the master state; when the second management sub-module is in the standby state, the BMC of the service node is in the slave state.
  • the BMCs of any two service nodes among the plurality of service nodes are connected, and the computer equipment is managed by accessing through any one of the plurality of service nodes.
  • the technical solution can ensure that when the state of the service node changes, the user can manage the computer equipment without having to go to the site to re-plug and unplug the cable to the interface of the new service node in the main state.
  • each service node further includes a logic circuit, the logic circuits of any two service nodes among the plurality of service nodes are connected, and the logic circuit obtains state information of the plurality of service nodes.
  • the logic circuit can quickly process the electrical signals related to the state, obtain the state information, and continuously refresh the state information.
  • the logic circuits of any two service nodes are connected to ensure the rapid synchronization of the state information of any two service nodes.
  • the BMC of no service node in the plurality of service nodes when the BMC of no service node in the plurality of service nodes is in the master state, the BMC of one service node is selected to enter the master state according to the principle of master selection.
  • the principle of master selection Through the principle of master selection, the BMC of a service node is quickly selected to enter the master state, and the computer equipment is managed to prevent the computer equipment from working abnormally due to lack of management operations.
  • the BMC of a service node is selected according to the principle of main selection. Enter the main state.
  • the service node in the master state cannot continue to perform management operations, other service nodes can quickly take over the management task, ensuring the normal operation of the service of the computer equipment.
  • the principle of selecting the master includes: conducting voting statistics, and the business node with the most votes enters the master state; when the number of votes is the same, the business node with the smallest slot number enters the master state.
  • the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions in the computer-readable storage medium are executed by a computer device, the computer device is made to perform the second aspect.
  • the method in any one of the feasible implementations, or making the computer device realize the function of the computer device in any one of the feasible implementations of the first aspect.
  • FIG. 1 is a schematic diagram of a computer equipment structure in the prior art
  • FIG. 2 is a schematic structural diagram of a computer device according to an embodiment of the application.
  • FIG. 3 is a schematic structural diagram of computer device access according to an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of another computer device access according to an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of switching between the master state and the slave state of a service node according to an embodiment of the present application
  • FIG. 6 is a flowchart of a method for periodically detecting service node status information in a computer device according to an embodiment of the present application
  • FIG. 7 is a flowchart of a method for arbitrating and selecting a service node in a primary state according to an embodiment of the present application
  • FIG. 8 is a schematic diagram of a follow-up voting structure according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a follow-up voting method according to an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a computer device in an ATCA form. Taking FIG. 1 as an example, the computer device 100 has four service nodes 110a, 110b, 110c, and 110d. The hardware structure of each service node is the same, as shown by the service node 110a, which includes a logic circuit 112 and a BMC 114.
  • the logic circuit 112 is used to detect the voltage signal and bus signal of the indicated state of the node, obtain the state information of the node, communicate with the BMC 114, receive and transmit the state information, and the BMC 114 according to the detection result of the logic circuit, and the BMC 114 obtains the state information of the current node based on the direct detection result of the current node, and manages the current node.
  • the management node 107 is in the master state
  • the management node 109 is in the standby state.
  • the BMC 114 will transmit the management information of the node 110a to the management node 107 through the backplane 116, and similarly the BMCs of other service nodes will also transmit the management information of the node to the management node 107 through the backplane 116.
  • the management node 107 in the master state will also obtain the status information of the common components of the chassis, and manage these common components of the chassis. For example, the wind speed of the fan module 101 is adjusted according to the temperature information of the computer device 100 , the power supply of the power supply module 103 is adjusted according to the power consumption information of the computer device 100 , and so on.
  • the management node 107 and the management node 109 have a master-slave relationship with each other.
  • the management node 107 When the management node 107 is in the master state, the management node 107 has the management authority to the entire computer device 100. At this time, the management node 109 is in the standby state and has no control over the entire computer Administrative rights of the device 100 .
  • the management node 109 is upgraded to the master state to replace the management node 107 to manage the computer equipment.
  • the management nodes 107 and 109 have independent hardware forms, which are different from the service nodes 110a, 110b, 110c, and 110d, and the management nodes 107 and 109 independently occupy two slots in the chassis 116, which leads to the integration of the computer equipment 100 The degree and space utilization are not low. Separate development and maintenance of management nodes increases costs.
  • FIG. 2 is a schematic structural diagram of a computer device 200 according to an embodiment of the present application
  • the computer device 200 has four service nodes 209a, 209b, 209c, and 209d.
  • each service node includes a hardware circuit 217, a BMC 215, and a CPU 219.
  • the BMC 215 When the BMC 215 is powered on and started, the first management sub-module 211 and the second management sub-module 213 will run.
  • the BMC 215 is connected to all other service nodes and the common components of the chassis through the backplane 207, including the fan module 201, the power module 203, the chassis 205, and so on. Connected here means that the bus is directly connected. Directly connected through the bus, the BMC 215 can directly collect information, check the status, and manage operations on all nodes and all components of the entire computer equipment 200. It can be understood that the direct connection of the bus does not exclude the possibility of driver chips and routing circuits on the bus, including the bus connection of these chips and the circuit, which is also directly connected to the bus.
  • the logic circuit 217 is used to detect and obtain the status information of the service node 209a, and is connected to all other service nodes, such as the logic circuits of 209b, 209c, and 209d in a fully interconnected manner, and can obtain the status of all other service nodes.
  • Information, state information includes: master-slave state information, in-position information, health state information, heartbeat information, arbitration voting information, and so on.
  • the bus method can also be used to ensure that the logic circuits of any two service nodes are directly connected to perform direct communication and exchange status information.
  • CPLD CPLD
  • LAN switch chip a LAN switch chip
  • the first management sub-module 211 and the second management sub-module 213 are run, wherein the first management sub-module 213 is used to manage this service node, and the second management sub-module 211 is used to manage and remove this service.
  • Service nodes other than nodes and common components of all chassis.
  • the logic circuit 217 is connected to the BMC 215 through the bus, communicates with the BMC 215, and exchanges node status information and management operation instructions.
  • the hardware structure of the service node 209a also includes hardware components such as a storage medium and some buses, which are not shown in FIG. 2 .
  • the service nodes 209a, 209b, 209c, and 209d have the same hardware structure, and all have the ability to manage the entire computer equipment 200. However, only when the BMC of one of the service nodes starts running and is in the master state, can the entire computer device 200 be managed. While the BMCs of other service nodes are in the slave state, the slave state means that the entire computer device 200 cannot be managed. The slave state also means that when the service node in the master state cannot continue to perform management operations, the service node in the slave state acts as a backup, converts it to the master state, and then performs management operations on the entire computer device 200, which will be explained in detail later. .
  • the second management sub-module 211 is in the working state
  • the working state means that the common components of the chassis and other services are being processed. Nodes are managed. While the BMCs of the other three service nodes are in the slave state, their second management submodules are in the standby state, and the standby state means that the second management module does not manage the common components of the chassis and other service nodes.
  • the technical solution of the present application is not limited to the division method of the first management sub-module and the second management sub-module in this embodiment.
  • the BMC after the BMC is powered on and started, it can run There are three sub-modules, the first management sub-module manages the current node, the second management sub-module manages the common components of the chassis, and the third management sub-module manages other service nodes. It is easy to think that this function division is not limited to the above two solutions, and examples are not given here.
  • the number of service nodes in the embodiment shown in FIG. 2 is 4 service nodes as an example, but in other embodiments, the number of service nodes may be any number.
  • the number of fan modules and the number of power modules are not limited, and can be any number.
  • Some components of the computer device are not shown in the computer device 200 in this embodiment, such as switching nodes, interface modules, etc. It is easy to understand that the computer device 200 also includes these nodes and components.
  • the above-mentioned service nodes are used to process customer services.
  • the service node may be a blade server node, a rack server node, or a switching node, a storage node, a computing node, and the like.
  • the common components of the chassis are components that provide common resources of the chassis, including the chassis, power modules, fan modules, and interface modules. It is further explained that when the service node is a rack server node, a backplane is no longer required, and each service node is connected, and each rack server node can be connected through a cable.
  • the computer device 200 shown in FIG. 2 may be a blade server device, a switching device including multiple switching nodes, or a storage device including multiple storage nodes, and so on.
  • FIG. 3 shows a schematic structural diagram 300 of user access management to the computer equipment 200.
  • the first management module 213 and the second management module 211 are run, wherein the second management module 211 is common to other service nodes 308 and chassis
  • the component 306 manages, and the first management module 213 manages the service node 310 .
  • the user can manage the entire computer device 200 by accessing the interactive interface 302 of the BMC 215 .
  • the interactive interface 302 may be: Simple Network Management Protocol (SNMP), Salmon RedFish, Intelligent Platform Management Interface (IPMI), website and command-line interface (CLI) and many more.
  • SNMP Simple Network Management Protocol
  • IPMI Intelligent Platform Management Interface
  • CLI command-line interface
  • the computer has a management node
  • the user can access the management node through a cable to manage the computer equipment.
  • the management node has only the main management node and the standby management node. After connecting both management nodes, even if the active and standby status of the management node is switched, there is no need to re-plug and unplug the cable, and the computer equipment can be accessed remotely. to manage.
  • the number of service nodes is usually more than two. For example, in the computer device 200 shown in FIG. 2 , the number of service nodes is 4, and it is not an efficient solution for a user to access so many service nodes at the same time through cables.
  • the embodiment of the present application provides a feasible solution. As shown in FIG.
  • the BMCs 410a and 410b of the two service nodes are connected.
  • the running first management module 404b manages the service node 408b. It should be noted that this service node refers to the service node to which the BMC belongs.
  • the running second management module 406b manages other service nodes 412 and chassis common components 414, performs management operations, and obtains management information.
  • the other service nodes 412 include service nodes to which the BMC 410a belongs.
  • the user still accesses the interactive interface 402a, but because the BMCs 410a and 410b are connected, the user can access the BMC 410b through 402a, and access the management information of the computer equipment 200, and perform management operations on the computer equipment 200.
  • the cross 416 shown in FIG. 4 means that the second management module 406a in the standby state does not manage other service nodes 412 and the common components 414 of the chassis at this time.
  • the first management module 404a manages the local service node 408a to which the BMC 402a belongs. Because the user does not directly access the BMC 410b, the interactive interface 402b of the BMC 410b may be in an idle state at this time.
  • the user when there are multiple service nodes of the computer device 200 , the user can access any one of the service nodes, thereby being able to access and manage the entire computer device 200 .
  • the technical solution can effectively prevent the problem of manually replugging and unplugging the cable accessed by the user to the service node in the active state when the state of the service node changes.
  • FIG. 5 shows a schematic flowchart of the switching of the state of the service node according to the embodiment of the present application.
  • each service node scans the status information of the service node and other service nodes in real time, and updates and synchronizes the status information of all nodes in time.
  • the scanning steps of this node are as follows:
  • step S606 the state information of the service node in the master state is obtained, and it is judged whether it is abnormal, or whether to apply for dropping from the slave. If it is normal or there is no application for demotion, then jump to step S612; the above-mentioned abnormality of the service node means that the service node cannot work normally and cannot realize the management operation.
  • process step S608 when the status information of the service node in the master state shows that the service node is abnormal or that the service node is applying to drop from the slave, then all service nodes or all other service nodes except the abnormal service node and the service node of the application to drop from the slave enter the arbitration state;
  • the service node entering the arbitration state completes the arbitration operation, and the current node obtains the arbitration result
  • process step S612 maintain the hardware heartbeat of the node to notify other service nodes of the health status of the node;
  • step S614 the state information of the node is updated, and the node remains in the slave state or enters the master state according to the arbitration result. And return to step S604, and perform the scanning and verification operation of the next stage with a duration of T2.
  • the purpose of the arbitration is to select a service node to enter the main state to manage the entire computer equipment.
  • FIG. 7 a flowchart of a method for arbitrating and selecting a service node to enter the main state provided by an embodiment of the present application.
  • step S702 start to arbitrate the service nodes that have entered the arbitration state
  • each service node or each service node entering the arbitration state votes then the number of votes is counted, and the votes are synchronized to each service node entering the arbitration state.
  • the business node enters the main state, and entering the main state means that the business node manages the entire computer equipment.
  • the slot number of the service nodes with the same number of votes is determined, and the service node with the smallest slot number is selected to enter the main state to manage the entire computer equipment.
  • the slot number is a common concept in the field, and is a label number representing the specific physical location of a node, a module, or a component in a computer device.
  • the state information of each service node or each service node entering the arbitration state is updated according to the result of voting based on the principle of electing a leader.
  • process step S708 in the time period of time length T1, the operation of voting according to the principle of electing a leader is performed multiple times, and the voting results and the status information of each service node or each service node entering the arbitration state are performed.
  • the purpose of the verification is to repeatedly confirm the main selection result and prevent the sound of noise.
  • a verification operation within a T1 time period is counted as a verification operation.
  • step S712 the arbitration operation and the process are completed, and a service node is finally determined to enter the main state.
  • a business node that enters the main state can be selected accurately and stably.
  • some abnormal situations may occur, such as problems in the communication link between some nodes, and their own state information cannot be synchronized.
  • a ticket follow-up mechanism is required to solve the problem that the status information of other business nodes cannot be obtained in this scenario, so as to ensure that these business nodes are still capable of conducting other business nodes. vote.
  • FIG. 8 is a schematic diagram illustrating a follow-up voting structure according to an embodiment of the present application.
  • service node X cannot The status information of the service node Y is obtained, so the service node X also votes for the service node Y according to the conventional method. Therefore, in this scenario, the status information of business node Y can be transmitted to business node X through business node Z, so that X has the ability to vote on business Y, and the voting result is synchronized to business node Y through business node Z.
  • each service node can still be transmitted to the intermediate node through the link communication between the intermediate nodes.
  • Each business node and synchronize the voting results of each business node to each business node in the same way.
  • FIG. 9 depicts a flow chart of a follow-voting method according to an embodiment of the present application, and describes the follow-vote mechanism in detail.
  • the service node X follow-up process starts.
  • the service node Y satisfies the principle of master selection, which has been described in detail above, and will not be repeated here.
  • step S906 it is determined whether the service node Y is not in place. If the service node X detects that the state information of the service node Y cannot be directly obtained, and the judgment result is that the service node is not in place, then jump to step S908; if the service node X detects that the state information of the service node Y can be directly obtained, the judgment result is that the service If the node is in place, jump to step S912.
  • step S908 check whether the service node Z that has voted for the service node Y works normally, if so, skip to the process step S912, if not, skip to the process step S910. It needs to be further explained that when it is checked that the service node Z is not working properly, the vote that has been cast for the service node Y is invalid.
  • the service node X cannot judge the status information of the service node Y, so the service node X refuses to vote for the service node Y.
  • the service node X can obtain and judge the status information of the service node Y, so the service node X agrees to vote for the service node Y.
  • step S914 when the business node X agrees or refuses to vote for the business node Y, the vote following process of the business node X ends.
  • the intermediate service node for example, service node Z in this embodiment is service node X and service node Y.
  • the intermediate nodes of these business nodes transmit the status information and voting results of these business nodes to each business node through indirect methods.
  • each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
  • the computer instructions of the aforementioned modules implemented in the form of software functional modules can be stored in a computer-readable storage medium.
  • the above-mentioned software function modules are stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute some steps of the methods described in various embodiments of the present invention.
  • the aforementioned storage medium can be a readable non-volatile storage medium, including: mobile hard disk, read-only memory (English: Read-Only Memory, ROM for short), random access memory (English: Random Access Memory, short for short)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Hardware Redundancy (AREA)

Abstract

一种计算机设备(200)和管理方法。计算机设备(200)包括:机框公共部件和多个业务节点(209a,209b,209c,209d),每个业务节点(209a,209b,209c,209d)包括主板管理控制器BMC(215),BMC(215)与机框公共部件相连;多个业务节点(209a,209b,209c,209d)中的任意一个业务节点(209a,209b,209c,209d)的BMC(215)当处于主状态的时候可以管理机框公共部件和所有业务节点(209a,209b,209c,209d)。该计算设备(200)通过使用业务节点(209a,209b,209c,209d),而不是独立的管理节点,对整个计算机设备(200)进行管理,从而有效的提高了计算机设备(200)的集成度,节省开发工作,降低计算机设备(200)的开发成本。

Description

一种计算机设备和管理方法 技术领域
本发明涉及计算机领域,尤其涉及一种计算机设备和管理方法。
背景技术
传统计算机设备,例如先进电信计算架构(Advanced Telecommunications Computing Architecture,ATCA)形态的计算机设备,设置有形态不一样的独立的业务节点和管理节点的槽位,业务节点用于处理客户业务,管理节点负责管理机框公共部件,例如风扇和电源,还用于监控机框内所有节点的状态。
然而,随着现在计算机设备的集成度越来越高,机框向小型化演进,对机框的空间利用效率的要求大大提高,但是负责计算机设备管理的管理节点仍然占用专用的独立空间,计算机设备的空间利用率无法进一步提高。另外,现在,多节点计算机设备的机框的硬件形态多样化,不同机框的管理节点不能做到形态统一,每种管理节点需要单独开发,导致计算机设备的软件和硬件开发和维护成本大大增加。
发明内容
本申请提供了一种计算机设备管理的方法和***,利用业务节点代替管理节点去管理计算机设备的技术方案,使用业务节点代替管理节点,对计算机设备进行管理,可以有效的提高计算机设备的集成度,省去管理节点的开发工作。
第一方面,本申请提供了一种计算机设备,该计算机设备包括:机框公共部件和多个业务节点,每个业务节点包括:主板管理控制器(Baseboard Management Controller,BMC),BMC与机框公共部件相连。通过业务节点的BMC和机框公共部件直接相连,业务节点可以对机框公共部件进行直接管理操作,省去独立的管理节点,计算机设备的空间利用率大大提高,且省去了独立开发管理节点产生的工作。
一种可行的实现方式中,多个业务节点中的任意一个业务节点的BMC用于当处于主状态的时候管理机框公共部件。只有处于主状态的业务节点的BMC,才有权限对机框公共部件进行管理,能够有效防止多个业务节点同时管理机框公共部件可能产生的操作冲突。
一种可行的实现方式中,多个业务节点中的任意一个业务节点的BMC还用于当处于主状态的时候管理多个业务节点。处于主状态的业务节点还可以管理所有的业务节点,保证处于主状态的业务节点可以管理整个计算机设备,实现整个计算机设备的管理工作。
一种可行的实现方式中,每个业务节点的BMC还用于运行第一管理子模块和第二管理子模块,第一管理子模块用于管理运行本业务节点;第二管理子模块的状态包括:工作状态和待机状态;当第二管理子模块处于工作状态的时候,第二管理子模块用于管理机框公共部件和除本业务节点以外的其他业务节点。
一种可行的实现方式中,当运行的第二管理子模块处于工作状态,本业务节点的BMC处于主状态;当运行的第二管理子模块处于待机状态的时候,本业务节点的BMC处于从状态。
一种可行的实现方式中,多个业务节点中任意两个业务节点的BMC之间相连,多个业 务中的任意一个业务节点用于被用户接入以管理计算机设备。该技术方案可以保证当业务节点的状态发生变化的时候,用户不用到现场去重新插拔线缆到新的处于主状态的业务节点的接口上才能对计算机设备进行管理。
一种可行的实现方式中,每个业务节点还包括逻辑电路,多个业务节点中任意两个业务节点的逻辑电路之间相连,逻辑电路用于获取多个业务节点的状态信息。逻辑电路可以快速处理状态相关的电信号,并得到状态信息,持续刷新状态信息,任意两个业务节点的逻辑电路之间相连,可以保证任意两个业务节点的状态信息快速同步。
一种可行的实现方式中,逻辑电路,包括:复杂可编程逻辑器件(complex programmable logic device,CPLD),局域网交换器件或控制器局域网电路(Controller Area Network,CAN)电路。
一种可行的实现方式中,多个业务节点中两个业务节点的逻辑电路之间相连的方式包括:全互连,或总线互连。
一种可行的实现方式中,多个业务节点还用于当多个业务节点中没有业务节点的BMC处于主状态的时候,根据选主原则,选出一个业务节点的BMC进入主状态。通过选主原则,快速选出一个业务节点的BMC进入主状态,对计算机设备进行管理,防止因为缺少管理操作导致计算机设备工作异常。
一种可行的实现方式中,多个业务节点还用于,当处于主状态的BMC属于的业务节点出现异常的时候,或处于主状态的业务节点的BMC申请状态切换的时候,根据选主原则,选出一个业务节点的BMC进入主状态。通过该技术方案,可以处于主状态的业务节点无法继续执行管理操作的时候,有其他的业务节点能快速的接替管理任务,保证计算机设备的业务正常运行。
第二方面,本申请提供了一种计算机设备的管理方法,该计算机设备包括:机框公共部件和多个业务节点,每个业务节点包括:主板管理控制器BMC,BMC与机框公共部件相连。管理方法包括:当处于主状态的时候,多个业务节点中的任意一个业务节点的BMC管理机框公共部件。通过业务节点的BMC和机框公共部件直接相连,业务节点可以对机框公共部件进行直接管理操作,省去独立的管理节点,计算机设备的空间利用率大大提高,且省去了独立开发管理节点产生的工作。只有处于主状态的业务节点的BMC,才有权限对机框公共部件进行管理,能够有效防止多个业务节点同时管理机框公共部件可能产生的操作冲突。
一种可行的实现方式中,当处于主状态的时候,多个业务节点中的任意一个业务节点的BMC还管理多个业务节点。处于主状态的业务节点还可以管理所有的业务节点,保证处于主状态的业务节点可以管理整个计算机设备,实现整个计算机设备的管理工作。
一种可行的实现方式中,每个业务节点的BMC还运行第一管理子模块和第二管理子模块,第一管理子模块管理本业务节点;第二管理子模块的状态包括:工作状态和待机状态;当第二管理子模块处于工作状态的时候,第二管理子模块管理机框公共部件和除本业务节点以外的其他业务节点。
一种可行的实现方式中,当第二管理子模块处于工作状态,本业务节点的BMC处于主状态;当第二管理子模块处于待机状态的时候,本业务节点的BMC处于从状态。
一种可行的实现方式中,多个业务节点中任意两个业务节点的BMC之间相连,通过多个业务中的任意一个业务节点接入以管理计算机设备。该技术方案可以保证当业务节点的状态发生变化的时候,用户不用到现场去重新插拔线缆到新的处于主状态的业务节点的接口上才能对计算机设备进行管理。
一种可行的实现方式中,每个业务节点还包括逻辑电路,多个业务节点中任意两个业务节点的逻辑电路之间相连,逻辑电路获取多个业务节点的状态信息。逻辑电路可以快速处理状态相关的电信号,并得到状态信息,持续刷新状态信息,任意两个业务节点的逻辑电路之间相连,可以保证任意两个业务节点的状态信息快速同步。
一种可行的实现方式中,当多个业务节点中没有业务节点的BMC处于主状态的时候,根据选主原则,选出一个业务节点的BMC进入主状态。通过选主原则,快速选出一个业务节点的BMC进入主状态,对计算机设备进行管理,防止因为缺少管理操作导致计算机设备工作异常。
一种可行的实现方式中,当处于主状态的BMC属于的业务节点出现异常的时候,或处于主状态的业务节点的BMC申请状态切换的时候,根据选主原则,选出一个业务节点的BMC进入主状态。通过该技术方案,可以处于主状态的业务节点无法继续执行管理操作的时候,有其他的业务节点能快速的接替管理任务,保证计算机设备的业务正常运行。
一种可行的实现方式中,选主原则包括:进行投票统计,得票最多的一个业务节点进入主状态;得票一样多的时候,槽位号最小的一个业务节点进入主状态。
第三方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机指令,当计算机可读存储介质中的计算机指令被计算机设备执行时,使得计算机设备执行第二方面中任一可行的实现方式中的方法,或者使得计算机设备实现第一方面中任一种可行的实现方式中的计算机设备的功能。
附图说明
图1为现有技术的计算机设备结构示意图;
图2为本申请实施例的一种计算机设备的结构示意图;
图3为本申请实施例的一种计算机设备访问的结构示意图;
图4为本申请实施例的另一种计算机设备访问的结构示意图;
图5为本申请实施例的业务节点主状态和从状态之间切换的流程示意图;
图6为本申请实施例的计算机设备中业务节点状态信息周期性检测方法的流程图;
图7为本申请实施例的仲裁选择主状态的业务节点的方法的流程图;
图8为本申请实施例的一种跟随投票结构的示意图;
图9为本申请实施例的一种跟随投票方法的流程图。
具体实施方式
在计算机设备管理领域,如何去高效准确管理一个计算机设备,保证客户运行在计算机设备上的业务不受影响一直是一个比较重要的问题,ATCA形态的计算机设备,是现有管理架构的计算机设备,该形态的计算机设备主要是包括有单独的两个管理节点,互为主备。管理节点对所有节点和机框公共部件进行管理。如图1所示为一种ATCA形态的计算机设备的结构示意图,以图1为例,计算机设备100有4个业务节点110a、110b、110c和110d。每个业务节点硬件结构上均相同,以业务节点110a所示,其包括有一个逻辑电路112和一个BMC114。逻辑电路112用于检测本节点指示状态的电压信号和总线信号,获取本节点的状态信息,和BMC 114之间进行通信,接收和传递该状态信息,BMC 114根据逻辑电路的检测结果,以及BMC 114对本节点的直接检测结果,获取本节点的状态信息,对本节点进行管理。当管 理节点107处于主状态时,管理节点109处于备状态。BMC 114会将本节点110a的管理信息通过背板116传递给管理节点107,同理其他业务节点的BMC也会通过背板116将本节点的管理信息传递给管理节点107。另外,作为机框公共部件,风扇模块101,电源模块103,机框107等,处于主状态的管理节点107也会去获取机框公共部件的状态信息,并对这些机框公共部件进行管理。例如根据计算机设备100的温度信息,对风扇模块101的风速进行调速,根据计算机设备100功耗信息,对电源模块103的供电功率进行调整等等。
管理节点107和管理节点109互为主备关系,当管理节点107处于主状态的时候,管理节点107拥有对整个计算机设备100的管理权限,此时,管理节点109处于备状态,没有对整个计算机设备100的管理权限。但是,当管理节点107出现异常等问题的时候,管理节点109升为主状态,以代替管理节点107,对计算机设备进行管理。
管理节点107和109拥有独立的硬件形态,和业务节点110a、110b、110c、110d不相同,并且管理节点107和109独立占有了机框116中的两个槽位,这样导致计算机设备100的集成度和空间利用率无低。单独开发和维护管理节点,增加了成本。
因此,本申请提出了一种可行的技术方案,如图2所示,为本申请一个实施例的计算机设备200的结构示意图,计算机设备200有4个业务节点209a、209b、209c、209d,背板207,风扇模块201,电源模块203和机框205,但是没有了独立的管理节点。
每个业务节点的硬件结构是相同的,以业务节点209a为例,其包括硬件电路217,BMC215,和CPU 219。当BMC 215上电启动后,会运行第一管理子模块211和第二管理子模块213。
其中BMC 215通过背板207,会和所有其他业务节点以及机框公共部件相连,包括风扇模块201、电源模块203和机框205等等。这里相连的意思是指,总线直接相连。通过总线直接相连,BMC 215能够对整个计算机设备200的所有节点和所有部件进行直接的信息搜集、状态检查和管理操作等。可以理解总线直接相连并不排除总线上可能出现的驱动芯片和选路电路等,包括这些芯片和电路的总线连接,也是总线直接相连。
逻辑电路217用于检测并获取本业务节点209a的状态信息,并通过全互连的方式,和所有其他业务节点,例如209b,209c,209d的逻辑电路相连,并能够获取所有其他业务节点的状态信息,状态信息包括:主从状态信息,在位信息,健康状态信息,心跳信息、仲裁投票信息等等。通过全互连的方式,保证所有业务节点中的任意两个业务节点的逻辑电路能够进行直接通信,交换各个业务节点的状态信息。
可以理解,除了全互连的方式,还可以采用总线的方式,保证任意两个业务节点的逻辑电路直接相连,进行直接通信,交换状态信息。
逻辑电路217的实现方式有多种,一种可行的方式是CPLD,还有其他可行的方式包括:局域网交换器LAN switch芯片,或者CAN电路。
当BMC 215在上电启动后运行第一管理子模块211和第二管理子模块213,其中第一管理子模块213用于管理本业务节点,而第二管理子模块211用于管理除去本业务节点以外的其他业务节点以及所有机框公共部件。
逻辑电路217通过总线和BMC 215相连,和BMC 215进行通信,交换节点状态信息和管理操作指令。
容易理解,业务节点209a的硬件结构内还包括存储介质等硬件部件和部分总线等,未在图2中体现。
业务节点209a、209b、209c、209d的硬件结构相同,都具有对整个计算机设备200进行 管理的能力。但是只有在其中一个业务节点的BMC启动运行后,处于主状态的时候,才可以对整个计算机设备200进行管理操作。而其他业务节点的BMC处于从状态,从状态代表着不能对整个计算机设备200进行管理操作。从状态还代表着当处于主状态的业务节点不能继续执行管理操作的时候,处于从状态的业务节点作为备份,转换成主状态,然后对整个计算机设备200进行管理操作,后面会此进行详细讲解。
以业务节点209a处于主状态,其他三个业务节点处于从状态为例,BMC 215处于主状态的时候,第二管理子模块211处于工作状态,工作状态是指正在对机框公共部件和其他业务节点进行管理。而其他三个业务节点的BMC处于从状态,它们的第二管理子模块处于待机状态,而待机状态是指第二管理模块没有对机框公共部件和其他业务节点进行管理。
需要进一步说明的是,本申请的技术方案不仅限于本实施例中的第一管理子模块和第二管理子模块的划分方式,另一种可行的实施例中,BMC上电启动后,可以运行三个子模块,第一管理子模块对本节点进行管理,第二管理子模块对机框公共部件进行管理,第三管理子模块对其他业务节点进行管理。容易想到的是,这种功能划分不限于上述的两种方案,在此不一一举例。
可以理解,图2所述实施例中业务节点的数量是以4个业务节点为例,但是在其他实施例中,业务节点的数量可以是任意数量。另外风扇模块的数量和电源模块的数量也不做限定,可以是任意数量。一些计算机设备的部件,在本实施例的计算机设备200并没有给出,例如交换节点,接口模块等等,容易理解计算机设备200也同样包括这些节点和部件。
上述的业务节点,用来处理客户业务。业务节点可以是刀片服务器节点、机架服务器节点,也可以是交换节点、存储节点、计算节点等等。机框公共部件是包括机框、电源模块、风扇模块、接口模块等等的提供机框公共资源的部件。进一步说明,当业务节点是机架服务器节点的时候,则不再需要背板,将每个业务节点相连,可以通过线缆将每个机架服务器节点进行连接。
需要进一步解释的是,当业务节点的BMC处于主状态等效为该业务节点处于主状态,当业务节点的BMC处于从状态的时候,该业务节点处于从状态。
图2中所示的计算机设备200可以是刀片服务器设备,也可以是包括多个交换节点的交换设备,或者是含有多个存储节点的存储设备等等。
图3给出用户对计算机设备200进行访问管理的结构示意图300,BMC 215上电后运行第一管理模块213和第二管理模块211,其中第二管理模块211对其他业务节点308和机框公共部件306进行管理,第一管理模块213对本业务节点310进行管理。而用户通过接入BMC215的交互接口302,可对整个计算机设备200进行管理。其中交互接口302可以是:简单网络管理协议(Simple Network Management Protocol,SNMP),鲑鱼RedFish,智能平台管理接口(Intelligent Platform Management Interface,IPMI),网站Website和命令行界面(command-line interface,CLI)等等。
如果计算机有管理节点,用户可以通过线缆接入管理节点,对计算机设备进行管理。管理节点只有主管理节点和备管理节点,将两个管理节点都接入后,即使管理节点的主备状态出现倒换,也不需要重新插拔线缆,还是可以远程访问计算机设备,对计算机设备进行管理。但是当计算机设备去除掉管理节点之后,业务节点的数量通常多余2个。例如在图2所示的计算机设备200中,业务节点的数量是4个,用户通过线缆同时接入这么多的业务节点并不是一个高效的解决方案。本申请实施例提供了一种可行的解决方案,如图4所示,以两个业务节点为例,两个业务节点的BMC 410a和410b相连,当BMC 410b启动运行,并处于主状 态的时候,运行的第一管理模块404b对本业务节点408b进行管理。需要说明的是本业务节点是指BMC所属于的业务节点。运行的第二管理模块406b对其他业务节点412和机框公共部件414进行管理,执行管理操作,获取管理信息。其中其他业务节点412包括BMC 410a属于的业务节点。此时用户仍旧接入的是交互接口402a,但是因为BMC 410a和410b相连,因此,用户可以通过402a访问BMC 410b,并访问计算机设备200的管理信息,对计算机设备200进行管理操作。需要进一步解释,图4中所示的十字416,是指处于待机状态的第二管理模块406a此时不对其他业务节点412和机框公共部件414进行管理。第一管理模块404a对BMC 402a所属的本业务节点408a进行管理。因为用户不是直接接入到BMC 410b,因此BMC 410b的交互接口402b此时可以处于空闲状态。
因此,通过上述实施例提供的技术方案,当计算机设备200的业务节点有多个的时候,用户可以接入其中任意一个业务节点,从而能够访问并管理整个计算机设备200。该技术方案可以有效的防止业务节点状态变化的时候,需要人工的去重新插拔用户访问的线缆到处于主状态的业务节点的问题。
业务节点的状态会随实际情况发生变化,从主状态进入到从状态,或者从从状态进入到主状态。图5所示为本申请实施例的业务节点的状态发生变化,从而切换的流程示意图。当所有的业务节点处于从状态,而没有业务节点处于主状态的时候,或处于主状态的业务节点出现故障,健康状态有问题的时候,所有业务节点或除故障节点外其他所有业务节点对该状态进行核查,当经过核查时间T2后,进入仲裁状态,经过仲裁时间T1和核查时间T3后,各个业务节点进入到对应的状态,其中一个业务节点进入到主状态,而其他业务节点进入到从状态。或者第二种场景下,处于主状态的业务节点申请降为从状态,则所有业务节点或除申请降从的业务节点外其他所有业务节点对该状态进行核查,经过核查时间T2后,进入仲裁状态,经过仲裁时间T1和核查时间T3后,各个业务节点进入到对应的状态,其中一个业务节点进入到主状态,而其他业务节点进入到从状态。需要进一步说明的是,故障的业务节点无法进入主状态,但是申请降从的业务节点,经过仲裁仍旧可以进入到主状态。另外,需要解释的是,降从是指降为从状态。
为了防止处于主状态的业务节点出现故障,而其他处于从状态的业务节点,没有及时的发现,并仲裁选出新的业务节点进入主状态,本申请实施例提供了一种可行的技术方案,如图6所示。每个业务节点会对本业务节点和其他业务节点的状态信息进行实时扫描,及时更新并同步所有节点的状态信息。本节点的扫描步骤如下所示:
依据流程步骤S602,扫描开始;
依据流程步骤S604,在时长为T2的时间段内,对业务节点状态信息进行多次扫描和核查;
依据流程步骤S606,获取处于主状态的业务节点的状态信息,并进行判断,是否是异常,或是否申请降从,如果判断结果是异常或申请降从,则跳转到步骤S608,如果判断结果是正常或没有申请降从,则跳转到步骤S612;上述业务节点异常,是指业务节点不能正常工作,不能实现管理操作。
依据流程步骤S608,当处于主状态的业务节点的状态信息显示该业务节点异常或申请降从,则所有业务节点或除异常和申请降从业务节点外其他所有业务节点进入仲裁状态;
依据流程步骤S610,进入仲裁状态的业务节点完成仲裁操作,本节点获取仲裁结果;
依据流程步骤S612,维持本节点的硬件心跳,以通知其他业务节点,本节点的健康状态;
依据流程步骤S614,更新本节点的状态信息,本节点依据仲裁结果,保持为从状态,或 者进入主状态。并返回到步骤S604,进行下一个阶段时长为T2的扫描和核查操作中。
通过上述实时反复扫描所有节点的状态信息的技术方案,从而监控所有业务节点的工作情况,达到保证计算机设备有一个正常工作的处于主状态的业务节点存在,保障用户的业务正常运行。
接下来,对仲裁过程和仲裁方法做进一步的解释,仲裁的目的是,选出一个业务节点进入主状态,对整个计算机设备进行管理。如图7所示,为本申请实施例提供的一种仲裁选择业务节点进入主状态的方法流程图。
依据流程步骤S702,对进入仲裁状态的业务节点开始进行仲裁;
依据流程步骤S704,按照选择原则,每个业务节点或每个进入仲裁状态的业务节点进行投票,然后统计票数,并同步票数到每个进入仲裁状态的业务节点上。当任意一个业务节点的票数是最高的时候,则该业务节点进入到主状态,进入主状态表示该业务节点对整个计算机设备进行管理。当有多个业务节点的票数是相同的时候,则判定这些票数相同的业务节点的槽位号,选择其中槽位号最小的一个业务节点进入主状态,对整个计算机设备进行管理。这里槽位号是本领域中常用概念,是代表一个节点或一个模块或一个部件等在计算机设备中的具体物理位置的标号数,从小到大,不重复的一个标号数。
依据流程步骤S706,根据选主原则投票的结果,更新每个业务节点或每个进入仲裁状态的业务节点的状态信息。
依据流程步骤S708,在时长为T1的时间段内,进行多次的按照选主原则进行投票的操作,并对投票结果,以及每个业务节点或每个进入仲裁状态的业务节点的状态信息进行核查,这样做的目的是为了反复确认选主结果,防止有噪声的发声。在一个T1时间段内的核查操作,算作一次核查操作。
依据流程步骤S710,当核查的次数超过预设的一个数值N的时候,则跳转到流程步骤S712;当核查次数没有超过该预设的数值N的时候,则跳转到流程步骤S704,其中N是一个正整数;
依据流程步骤S712,仲裁操作和流程完成,最终确定一个业务节点进入主状态。
通过上述方法,可以准确和稳定的选出一个进入主状态的业务节点,但是投票过程中,可能会出现一些异常情况,例如一些节点之间的通信链路出现问题,无法将自身的状态信息同步到每个业务节点,所以,当这种场景出现后,需要有跟票机制,来解决这种场景下无法获取其他业务节点状态信息的问题,以保证这些业务节点仍旧有能力对其他业务节点进行投票。
图8为描述了本申请实施例的一种跟随投票结构的示意图,例如,有三个业务节点X、Y、Z,当业务节点X和业务节点Y之间的链路发生故障,业务节点X无法获取业务节点Y的状态信息,因此业务节点X也就按照常规方法投票给业务节点Y。因此这个场景下,业务节点Y的状态信息可以通过业务节点Z传递给业务节点X,这样X就有能力对业务Y进行投票,并将投票结果通过业务节点Z,同步给业务节点Y。容易理解,在拥有更多业务节点的计算机设备中,出现更多的链路故障的场景下,依然可以通过中间节点和中间节点之间的链路通信,将每个业务节点的状态信息传递给每个业务节点,并将每个业务节点的投票结果同样的道理同步给每个业务节点。
进一步解释,图9描述了本申请实施例的一种跟随投票方法的流程图,对跟票机制进行详细描述。
依据流程步骤S902,业务节点X跟票流程开始。
依据流程步骤S904,业务节点Y满足选主原则,为上文中已对选主原则进行详细描述,这里不做赘述。
依据流程步骤S906,判断业务节点Y是否不在位。如果业务节点X检测到无法直接获取业务节点Y的状态信息,判断结果是业务节点不在位,则跳转步骤到S908;如果业务节点X检测到直接获取业务节点Y的状态信息,判断结果是业务节点在位,则跳转步骤到S912。
依据流程步骤S908,检查已经投票给业务节点Y的业务节点Z是否工作正常,如果是,则跳到流程步骤S912,如果否,则跳到流程步骤S910。需要进一步解释的是当检查到业务节点Z不是工作正常的时候,已经投给业务节点Y的票为无效。
依据流程步骤S910,因为业务节点Z有异常,则业务节点X无法判断业务节点Y的状态信息,因此业务节点X拒绝投票给业务节点Y。
依据流程步骤S912,因为业务节点Z工作正常,业务节点X可以获取和判断业务节点Y的状态信息,因此业务节点X同意投票给业务节点Y。
依据流程步骤S914,当业务节点X同意或拒绝投票给业务节点Y后,业务节点X的跟票流程结束。
通过上述技术方案,可以有效的避免部分业务节点的状态信息和投票结果无法直接传递给每个业务节点的时候,通过中间业务节点,例如本实施例中业务节点Z是业务节点X和业务节点Y的中间节点,将这些业务节点的状态信息和投票结果通过间接的方法传给每个业务节点。
应该理解,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
上述以软件功能模块的形式实现的模块的计算机指令,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质可以为可读取的非易失性存储介质,包括:移动硬盘、只读存储器(英文:Read-Only Memory,简称ROM)、随机存取存储器(英文:Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案脱离权力要求的保护范围。

Claims (21)

  1. 一种计算机设备,其特征在于,所述计算机设备包括:机框公共部件和多个业务节点,每个业务节点包括:主板管理控制器BMC,所述BMC与所述机框公共部件相连。
  2. 根据权利要求1所述计算机设备,其特征在于,所述多个业务节点中的任意一个业务节点的BMC用于当处于主状态的时候管理机框公共部件。
  3. 根据权利要求2所述计算机设备,其特征在于,所述多个业务节点中的任意一个业务节点的BMC还用于当处于所述主状态的时候管理所述多个业务节点。
  4. 根据权利要求1-3任一项所述计算机设备,其特征在于,每个业务节点的BMC还用于运行第一管理子模块和第二管理子模块,
    所述第一管理子模块用于管理运行本业务节点;
    所述第二管理子模块的状态包括:工作状态和待机状态;
    当所述第二管理子模块处于所述工作状态的时候,所述第二管理子模块用于管理所述机框公共部件和除所述本业务节点以外的其他业务节点。
  5. 根据权利要求4所述计算机设备,其特征在于,当运行的所述第二管理子模块处于所述工作状态,所述本业务节点的BMC处于所述主状态;当运行的所述第二管理子模块处于所述待机状态的时候,所述本业务节点的BMC处于从状态。
  6. 根据权利要求1-5任一项所述计算机设备,其特征在于,所述多个业务节点中任意两个业务节点的BMC之间相连,所述多个业务中的任意一个业务节点用于被用户接入以管理所述计算机设备。
  7. 根据权利要求1-6任一项所述计算机设备,其特征在于,每个业务节点还包括逻辑电路,所述多个业务节点中任意两个业务节点的所述逻辑电路之间相连,所述逻辑电路用于获取所述多个业务节点的状态信息。
  8. 根据权利要求7所述计算机设备,其特征在于,所述逻辑电路,包括:复杂可编程逻辑器件CPLD,局域网交换器件或控制器局域网电路CAN电路。
  9. 根据权利要求8所述计算机设备,其特征在于,所述多个业务节点中任意两个业务节点的所述逻辑电路之间相连的方式包括:全互连,或总线互连。
  10. 根据权利要求2-9任一项所述计算机设备,其特征在于,所述多个业务节点还用于当所述多个业务节点中没有业务节点的BMC处于所述主状态的时候,根据选主原则,选出一个业务节点的BMC进入所述主状态。
  11. 根据权利要求2-9任一项所述计算机设备,其特征在于,所述多个业务节点还用于,当处于所述主状态的BMC属于的业务节点出现异常的时候,或处于所述主状态的业务节点的BMC申请状态切换的时候,根据选主原则,选出一个业务节点的BMC进入所述主状态。
  12. 一种计算机设备的管理方法,其特征在于,所述计算机设备包括:机框公共部件和多个业务节点,
    每个业务节点包括:主板管理控制器BMC,所述BMC与所述机框公共部件相连;
    所述管理方法包括:当处于主状态的时候,所述多个业务节点中的任意一个业务节点的BMC管理机框公共部件。
  13. 根据权利要求12所述计算机设备的管理方法,其特征在于,当处于所述主状态的时候,所述多个业务节点中的任意一个业务节点的BMC还管理所述多个业务节点。
  14. 根据权利要求12-13任一项所述计算机设备的管理方法,其特征在于,每个业务节点的BMC还运行第一管理子模块和第二管理子模块,
    所述第一管理子模块管理本业务节点;
    所述第二管理子模块的状态包括:工作状态和待机状态;
    当所述第二管理子模块处于所述工作状态的时候,所述第二管理子模块管理所述机框公共部件和除所述本业务节点以外的其他业务节点。
  15. 根据权利要求14所述计算机设备的管理方法,其特征在于,当所述第二管理子模块处于所述工作状态,所述本业务节点的BMC处于所述主状态;当所述第二管理子模块处于所述待机状态的时候,所述本业务节点的BMC处于从状态。
  16. 根据权利要求12-15任一项所述计算机设备的管理方法,其特征在于,所述多个业务节点中任意两个业务节点的BMC之间相连,通过所述多个业务中的任意一个业务节点接入以管理所述计算机设备。
  17. 根据权利要求12-16任一项所述计算机设备的管理方法,其特征在于,每个业务节点还包括逻辑电路,所述多个业务节点中任意两个业务节点的所述逻辑电路之间相连,所述逻辑电路获取所述多个业务节点的状态信息。
  18. 根据权利要求12-17任一项所述计算机设备的管理方法,其特征在于,当所述多个业务节点中没有业务节点的BMC处于所述主状态的时候,根据选主原则,选出一个业务节点的BMC进入所述主状态。
  19. 根据权利要求12-17任一项所述计算机设备的管理方法,其特征在于,当处于所述主状态的BMC属于的业务节点出现异常的时候,或处于所述主状态的业务节点的BMC申请状态切换的时候,根据选主原则,选出一个业务节点的BMC进入所述主状态。
  20. 根据权利要求18-19任一项所述计算机设备的管理方法,其特征在于,所述选主原则包括:进行投票统计,得票最多的一个业务节点进入所述主状态;得票一样多的时候,槽位号最小的一个业务节点进入所述主状态。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述计算机可读存储介质中的计算机指令被计算机设备执行时,使得所述计算机设备执行所述权利要求12-20中任一项权利要求所述的方法,或者使得所述计算机设备实现所述权利要求1-11任一项权利要求所述的计算机设备的功能。
PCT/CN2021/124249 2020-10-16 2021-10-16 一种计算机设备和管理方法 WO2022078519A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21879560.7A EP4213017A4 (en) 2020-10-16 2021-10-16 COMPUTER DEVICE AND MANAGEMENT METHOD
US18/298,739 US20230244550A1 (en) 2020-10-16 2023-04-11 Computer device and management method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011112672.3A CN114385319A (zh) 2020-10-16 2020-10-16 一种计算机设备和管理方法
CN202011112672.3 2020-10-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/298,739 Continuation US20230244550A1 (en) 2020-10-16 2023-04-11 Computer device and management method

Publications (1)

Publication Number Publication Date
WO2022078519A1 true WO2022078519A1 (zh) 2022-04-21

Family

ID=81193063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124249 WO2022078519A1 (zh) 2020-10-16 2021-10-16 一种计算机设备和管理方法

Country Status (4)

Country Link
US (1) US20230244550A1 (zh)
EP (1) EP4213017A4 (zh)
CN (1) CN114385319A (zh)
WO (1) WO2022078519A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115456640A (zh) * 2022-08-17 2022-12-09 广东省第二人民医院(广东省卫生应急医院) 一种药品监控溯源方法及***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1863081A (zh) * 2005-10-14 2006-11-15 华为技术有限公司 基于智能平台管理接口的管理***和方法
CN102187640A (zh) * 2011-04-13 2011-09-14 华为技术有限公司 多业务节点管理***、装置及方法
CN104506362A (zh) * 2014-12-29 2015-04-08 浪潮电子信息产业股份有限公司 一种cc-numa多节点服务器上***状态切换和监控的方法
US20160078342A1 (en) * 2012-05-04 2016-03-17 Transoft (Shanghai), Inc. Systems and methods of autonomic virtual network management
CN105549696A (zh) * 2015-12-07 2016-05-04 中国电子科技集团公司第三十二研究所 具有机箱管理功能的机架式服务器***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1863081A (zh) * 2005-10-14 2006-11-15 华为技术有限公司 基于智能平台管理接口的管理***和方法
CN102187640A (zh) * 2011-04-13 2011-09-14 华为技术有限公司 多业务节点管理***、装置及方法
US20160078342A1 (en) * 2012-05-04 2016-03-17 Transoft (Shanghai), Inc. Systems and methods of autonomic virtual network management
CN104506362A (zh) * 2014-12-29 2015-04-08 浪潮电子信息产业股份有限公司 一种cc-numa多节点服务器上***状态切换和监控的方法
CN105549696A (zh) * 2015-12-07 2016-05-04 中国电子科技集团公司第三十二研究所 具有机箱管理功能的机架式服务器***

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4213017A4 *

Also Published As

Publication number Publication date
US20230244550A1 (en) 2023-08-03
EP4213017A4 (en) 2024-04-10
EP4213017A1 (en) 2023-07-19
CN114385319A (zh) 2022-04-22

Similar Documents

Publication Publication Date Title
CN109471770B (zh) 一种***管理方法和装置
US9619243B2 (en) Synchronous BMC configuration and operation within cluster of BMC
US9934183B2 (en) Server comprising a plurality of modules
DE102012210582B4 (de) Verringern der Auswirkung des Ausfalls einer Vermittlungsstelle in einem Schaltnetzwerk mittels Schaltkarten
US6948021B2 (en) Cluster component network appliance system and method for enhancing fault tolerance and hot-swapping
KR100694297B1 (ko) Atca 시스템에서의 이중화된 셀프 매니저 제공 장치
US20020152425A1 (en) Distributed restart in a multiple processor system
US9214809B2 (en) Dynamically configuring current sharing and fault monitoring in redundant power supply modules
WO2022078519A1 (zh) 一种计算机设备和管理方法
CN106940676B (zh) 机柜的监控***
JP2016027470A (ja) 高稼働率環境においてアプリケーションを同期的に実行する方法および装置
CN111984471B (zh) 一种机柜电源bmc冗余管理***及方法
CN111628944B (zh) 交换机及交换机***
CN113038299A (zh) 一种交换机、配置方法、控制方法以及存储介质
CN100392631C (zh) 对远程扩展设备的多主机支持
CN103138969A (zh) 服务器机架***
CN100375961C (zh) 应用于刀锋伺服***的错误检测方法与装置
JP2016167213A (ja) ブレード装置およびブレード装置管理方法
CN111262745A (zh) 信息处理平台冗余***设计
US7464257B2 (en) Mis-configuration detection methods and devices for blade systems
CN113742142B (zh) 存储***管理sata硬盘的方法及存储***
CN114047803A (zh) 计算机主板、双路处理器计算机及四路处理器计算机
CN112486868B (zh) 基于cpld存储双控同步***、方法、设备及存储介质
CN113064664A (zh) 一种控制方法、装置、复杂可编程逻辑器件及服务器
CN117992270B (zh) 一种内存资源管理***、方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21879560

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2021879560

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021879560

Country of ref document: EP

Effective date: 20230413

NENP Non-entry into the national phase

Ref country code: DE