CN105426286A - System for monitoring whole rack server - Google Patents
System for monitoring whole rack server Download PDFInfo
- Publication number
- CN105426286A CN105426286A CN201510745328.0A CN201510745328A CN105426286A CN 105426286 A CN105426286 A CN 105426286A CN 201510745328 A CN201510745328 A CN 201510745328A CN 105426286 A CN105426286 A CN 105426286A
- Authority
- CN
- China
- Prior art keywords
- whole machine
- machine cabinet
- module
- cabinet server
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a system for monitoring a whole rack server. The system comprises a data acquisition module, a data processing module, a control module and a power module, wherein the data acquisition module is used for acquiring running data of a running state of the whole rack server and storing the acquired running data in an internal cache for data access by the data processing module; the data processing module is connected with the data acquisition module and used for obtaining the running data and storing the running data in the internal cache for data access by the control module; and the control module is connected with the data processing module and the power module and is used for obtaining the running data and power information of the power module to monitor the whole rack server in real time. According to the system for monitoring the whole rack server, the timeliness of monitoring each device of the whole rack server is improved, device faults can be timely and effectively discovered and handled, and the availability and reliability of the whole rack server are improved.
Description
Technical field
The present invention relates to server technology field, particularly relate to a kind of system that whole machine cabinet server is monitored.
Background technology
Along with user is to the raising of the performance requirement of computing machine, user gets more and more to the quantity required of server.SmartRack whole machine cabinet server has very large advantage at node density with on TCO compared with traditional server, and applies more and more extensive in practice.
SmartRack whole machine cabinet server is at equipment such as the integrated computing node of interior of equipment cabinet, memory node, fan, power supplys.The equipment such as computing node, memory node, fan, power supply have self FW, can carry out the monitoring of equipment of itself.Because SmartRack whole machine cabinet internal unit is various, interface and the communication protocol of equipment are different, adopt the monitoring framework of traditional server, by the information of the inner all devices of BMC monitoring cabinet, from the time or the complexity managed all can not meet the monitoring demand of equipment cabinet server.
Summary of the invention
The object of this invention is to provide a kind of system that whole machine cabinet server is monitored, to monitor in real time whole machine cabinet server, thus discovering device fault and handling failure timely and effectively, improve availability and the reliability of whole machine cabinet server.
For solving the problems of the technologies described above, the invention provides a kind of system that whole machine cabinet server is monitored, comprising:
Data acquisition module, data processing module, control module and power module;
Wherein, the described service data collected for gathering the service data of whole machine cabinet operation condition of server, and is kept at inner buffer for described data processing module and carries out data access by described data acquisition module;
Described data processing module is connected with described data acquisition module, for obtaining described service data, and described service data is kept at inner buffer for described control module and carries out data access;
Described control module is connected with described data processing module, described power module, for obtaining the power information of described service data and described power module, to monitor in real time described whole machine cabinet server.
Alternatively, described data acquisition module comprises:
First collecting unit, for gathering the state of computing node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the computing node collected;
Second collecting unit, for gathering the state of memory node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the memory node collected;
3rd collecting unit, for gathering the fan information of described whole machine cabinet server fan running status.
Alternatively, described control module, specifically for monitoring the failure message of the described service data got, described power module and status information, carries out malfunction monitoring to described whole machine cabinet server.
Alternatively, described 3rd collecting unit is specifically for obtaining described fan information by fan control board.
Alternatively, described data processing module is specially plate in node.
Alternatively, described fan control board is connected with plate in described node by I2C bus.
Alternatively, described first collecting unit, described second collecting unit are connected with plate in described node by IPMB bus.
Alternatively, described control module is connected with plate in described node by I2C bus, serial ports or netting twine.
Alternatively, described control module is connected with described power module by I2C bus.
The system monitored whole machine cabinet server provided by the present invention, by the service data of data collecting module collected whole machine cabinet operation condition of server, and is kept at inner buffer for data processing module and carries out data access by the service data collected; Data processing module obtains this service data, and service data is kept at inner buffer for control module and carries out data access; Control module obtains the power information of this service data and power module, to monitor in real time whole machine cabinet server.The system monitored whole machine cabinet server provided by the present invention, improves the real-time to each monitoring of tools of whole machine cabinet server, can discovering device fault and handling failure timely and effectively, improves availability and the reliability of whole machine cabinet server.
Accompanying drawing explanation
Fig. 1 is the structured flowchart of a kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention;
Fig. 2 is the schematic diagram of the another kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention.
Embodiment
In order to make those skilled in the art person understand the present invention program better, below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, this system comprises the structured flowchart of a kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention:
Data acquisition module 1, data processing module 2, control module 3 and power module 4;
Wherein, the described service data collected for gathering the service data of whole machine cabinet operation condition of server, and is kept at inner buffer for described data processing module and carries out data access by described data acquisition module 1;
Described data processing module 2 is connected with described data acquisition module 1, for obtaining described service data, and described service data is kept at inner buffer for described control module and carries out data access;
Described control module 3 is connected with described data processing module 2, power module 4, for obtaining the power information of described service data and described power module, to monitor in real time described whole machine cabinet server.
The system monitored whole machine cabinet server provided by the present invention, by the service data of data collecting module collected whole machine cabinet operation condition of server, and is kept at inner buffer for data processing module and carries out data access by the service data collected; Data processing module obtains this service data, and service data is kept at inner buffer for control module and carries out data access; Control module obtains the power information of this service data and power module, to monitor in real time whole machine cabinet server.The system monitored whole machine cabinet server provided by the present invention, improves the real-time to each monitoring of tools of whole machine cabinet server, can discovering device fault and handling failure timely and effectively, improves availability and the reliability of whole machine cabinet server.
As a kind of embodiment, above-mentioned data acquisition module 1 can specifically comprise:
First collecting unit, for gathering the state of computing node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the computing node collected;
Second collecting unit, for gathering the state of memory node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the memory node collected;
3rd collecting unit, for gathering the fan information of described whole machine cabinet server fan running status.
Particularly, the 3rd collecting unit can obtain described fan information by fan control board.
As a kind of embodiment, above-mentioned control module 3 specifically for monitoring the failure message of the described service data got, described power module and status information, can carry out malfunction monitoring to described whole machine cabinet server.
The schematic diagram of the another kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention as shown in Figure 2, in this specific embodiment, is described for SmartRack whole machine cabinet server.
In the present embodiment, the first collecting unit, the second collecting unit by the mode of BMC and EMC traditionally rack, are monitored computing node and memory node respectively respectively, and data processing module adopts plate in node to realize, and control module is specifically realized by RMC.
Particularly, BMC is used for monitoring calculation node state, monitor data is kept at inner buffer and carries out data access for plate in node.Monitor data, for monitoring memory node state, is kept at inner buffer and carries out data access for plate in node by EMC.
In node, plate obtains computing node status information by BMC, obtains memory node status information by EMC, and the fan control board that access is connected with plate in node obtains fan information and carries out rotation speed of the fan control.Computing node, memory node and fan information are kept at inner buffer for RMC and carry out data access the most at last.
RMC obtains whole machine cabinet server computing node, memory node, fan and power information by plate in node, and the fan that is directly directly connected with RMC of poll and power information.Unified interface is externally provided, realizes the monitoring to SmartRack whole machine cabinet information.
Particularly, the inside of SmartRack whole machine cabinet server can be divided into different tray.Each tray has plate in node to monitor as the secondary of cabinet equipment, and the fan control board of tray inside is connected in node on plate by I2C bus, and simultaneously BMC and EMC to be connected in node on plate by IPMB bus.These information are saved in inner buffer by fan, nodal information in node on plate travel all over Tray simultaneously.
In the present embodiment, RMC, as the Surveillance center of rack, is connected to power module by I2C bus, to be connected in node on plate by I2C, serial ports or netting twine.Power module failure information and status information is obtained by I2C.Fan, the nodal information on Tray is obtained by plate in access node.
Visible, the present invention provides service as the Surveillance center of whole machine cabinet server to user using RMC, in node, plate is as the secondary monitoring framework of equipment cabinet server, the monitoring of the computing node being responsible for being attached thereto, memory node, fan, the monitoring that BMC is responsible for the monitoring of computing node, EMC is responsible for memory node, achieve three grades of monitoring to SmartRack whole machine cabinet server, make user convenient as access traditional server BMC by RMC access whole machine cabinet monitor message.Invention increases the real-time that RMC monitors the equipment such as SmartRack whole machine cabinet internal calculation node, memory node, fan and power supply, checkout equipment fault can be processed timely and effectively to go forward side by side row relax, improve availability and the reliability of whole machine cabinet server.
In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment same or similar part mutually see.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.
Claims (9)
1. to the system that whole machine cabinet server is monitored, it is characterized in that, comprising:
Data acquisition module, data processing module, control module and power module;
Wherein, the described service data collected for gathering the service data of whole machine cabinet operation condition of server, and is kept at inner buffer for described data processing module and carries out data access by described data acquisition module;
Described data processing module is connected with described data acquisition module, for obtaining described service data, and described service data is kept at inner buffer for described control module and carries out data access;
Described control module is connected with described data processing module, described power module, for obtaining the power information of described service data and described power module, to monitor in real time described whole machine cabinet server.
2. the system as claimed in claim 1 whole machine cabinet server monitored, it is characterized in that, described data acquisition module comprises:
First collecting unit, for gathering the state of computing node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the computing node collected;
Second collecting unit, for gathering the state of memory node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the memory node collected;
3rd collecting unit, for gathering the fan information of described whole machine cabinet server fan running status.
3. the system as claimed in claim 2 whole machine cabinet server monitored, it is characterized in that, described control module, specifically for monitoring the failure message of the described service data got, described power module and status information, carries out malfunction monitoring to described whole machine cabinet server.
4. the system monitored whole machine cabinet server as claimed in claim 3, it is characterized in that, described 3rd collecting unit is specifically for obtaining described fan information by fan control board.
5. the system monitored whole machine cabinet server as claimed in claim 4, it is characterized in that, described data processing module is specially plate in node.
6. the system monitored whole machine cabinet server as claimed in claim 5, is characterized in that, described fan control board is connected with plate in described node by I2C bus.
7. the system monitored whole machine cabinet server as claimed in claim 6, is characterized in that, described first collecting unit, described second collecting unit are connected with plate in described node by IPMB bus.
8. the system monitored whole machine cabinet server as claimed in claim 7, is characterized in that, described control module is connected with plate in described node by I2C bus, serial ports or netting twine.
9. the system monitored whole machine cabinet server as described in any one of claim 1 to 8, is characterized in that, described control module is connected with described power module by I2C bus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510745328.0A CN105426286B (en) | 2015-11-05 | 2015-11-05 | A kind of system being monitored to whole machine cabinet server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510745328.0A CN105426286B (en) | 2015-11-05 | 2015-11-05 | A kind of system being monitored to whole machine cabinet server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105426286A true CN105426286A (en) | 2016-03-23 |
CN105426286B CN105426286B (en) | 2018-05-04 |
Family
ID=55504504
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510745328.0A Active CN105426286B (en) | 2015-11-05 | 2015-11-05 | A kind of system being monitored to whole machine cabinet server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105426286B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106371958A (en) * | 2016-08-31 | 2017-02-01 | 浪潮电子信息产业股份有限公司 | Server fault diagnosis system and method |
CN106598810A (en) * | 2016-12-16 | 2017-04-26 | 中国航空工业集团公司洛阳电光设备研究所 | Multi-CPU airborne data processing unit BIT monitoring architecture |
CN107239385A (en) * | 2017-06-06 | 2017-10-10 | 郑州云海信息技术有限公司 | A kind of server and instruction lamp control method |
CN107239346A (en) * | 2017-06-09 | 2017-10-10 | 郑州云海信息技术有限公司 | A kind of whole machine cabinet computing resource tank node and computing resource pond framework |
CN107248940A (en) * | 2017-06-12 | 2017-10-13 | 郑州云海信息技术有限公司 | A kind of whole machine cabinet monitoring management module, whole machine cabinet server and data center |
CN107422276A (en) * | 2017-07-31 | 2017-12-01 | 郑州云海信息技术有限公司 | Device and method is surveyed in a kind of power cabinet physical examination |
CN107543987A (en) * | 2017-08-30 | 2018-01-05 | 郑州云海信息技术有限公司 | A kind of Smart Rack condition monitoring systems and monitoring method |
CN107977273A (en) * | 2016-10-25 | 2018-05-01 | 郑州云海信息技术有限公司 | The Memory Optimize Method of node information collection memory sharing in a kind of cabinet |
CN108763022A (en) * | 2018-05-28 | 2018-11-06 | 深圳市瑞驰信息技术有限公司 | A kind of intelligent-platform management interface system based on I2C agreements |
CN109101400A (en) * | 2018-08-16 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of monitoring system of cloud computation data center whole machine cabinet server |
CN109586994A (en) * | 2018-11-01 | 2019-04-05 | 郑州云海信息技术有限公司 | A kind of whole machine cabinet server burn-in test monitoring method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100287395A1 (en) * | 2009-05-06 | 2010-11-11 | Via Technologies, Inc. | Computer system for processing data in non-operational state and processing method thereof |
CN102495785A (en) * | 2011-12-23 | 2012-06-13 | 创新科存储技术(深圳)有限公司 | Centralized management method and device for servers of whole equipment cabinet |
CN104820479A (en) * | 2015-04-24 | 2015-08-05 | 北京百度网讯科技有限公司 | Controlling method and device for whole cabinet server fan |
-
2015
- 2015-11-05 CN CN201510745328.0A patent/CN105426286B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100287395A1 (en) * | 2009-05-06 | 2010-11-11 | Via Technologies, Inc. | Computer system for processing data in non-operational state and processing method thereof |
CN102495785A (en) * | 2011-12-23 | 2012-06-13 | 创新科存储技术(深圳)有限公司 | Centralized management method and device for servers of whole equipment cabinet |
CN104820479A (en) * | 2015-04-24 | 2015-08-05 | 北京百度网讯科技有限公司 | Controlling method and device for whole cabinet server fan |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106371958A (en) * | 2016-08-31 | 2017-02-01 | 浪潮电子信息产业股份有限公司 | Server fault diagnosis system and method |
CN107977273A (en) * | 2016-10-25 | 2018-05-01 | 郑州云海信息技术有限公司 | The Memory Optimize Method of node information collection memory sharing in a kind of cabinet |
CN106598810A (en) * | 2016-12-16 | 2017-04-26 | 中国航空工业集团公司洛阳电光设备研究所 | Multi-CPU airborne data processing unit BIT monitoring architecture |
CN107239385A (en) * | 2017-06-06 | 2017-10-10 | 郑州云海信息技术有限公司 | A kind of server and instruction lamp control method |
CN107239346A (en) * | 2017-06-09 | 2017-10-10 | 郑州云海信息技术有限公司 | A kind of whole machine cabinet computing resource tank node and computing resource pond framework |
CN107248940A (en) * | 2017-06-12 | 2017-10-13 | 郑州云海信息技术有限公司 | A kind of whole machine cabinet monitoring management module, whole machine cabinet server and data center |
CN107422276A (en) * | 2017-07-31 | 2017-12-01 | 郑州云海信息技术有限公司 | Device and method is surveyed in a kind of power cabinet physical examination |
CN107543987A (en) * | 2017-08-30 | 2018-01-05 | 郑州云海信息技术有限公司 | A kind of Smart Rack condition monitoring systems and monitoring method |
CN108763022A (en) * | 2018-05-28 | 2018-11-06 | 深圳市瑞驰信息技术有限公司 | A kind of intelligent-platform management interface system based on I2C agreements |
CN109101400A (en) * | 2018-08-16 | 2018-12-28 | 郑州云海信息技术有限公司 | A kind of monitoring system of cloud computation data center whole machine cabinet server |
CN109586994A (en) * | 2018-11-01 | 2019-04-05 | 郑州云海信息技术有限公司 | A kind of whole machine cabinet server burn-in test monitoring method and system |
Also Published As
Publication number | Publication date |
---|---|
CN105426286B (en) | 2018-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105426286A (en) | System for monitoring whole rack server | |
CN105389244B (en) | A kind of server monitoring method and device | |
Caulfield et al. | A cloud-scale acceleration architecture | |
CN102567227B (en) | Double-controller memory system and method for sharing cache equipment | |
CN102929769B (en) | Virtual machine internal-data acquisition method based on agency service | |
CN105472291B (en) | The digital hard disc video recorder and its implementation of multiprocessor cluster | |
CN108156225B (en) | Micro-application monitoring system and method based on container cloud platform | |
CN102346707B (en) | Server system and operation method thereof | |
CN105808499A (en) | CPU interconnection device and multichannel server CPU interconnection topological structure | |
CN105373462A (en) | Whole cabinet server management method and system | |
CN103716173A (en) | Storage monitoring system and monitoring alarm issuing method | |
EP3123272A1 (en) | Systems and methods for monitoring a configuration of ups groups with different redundancy levels | |
WO2015192664A1 (en) | Device monitoring method and apparatus | |
CN105577430A (en) | Node management method of high-end fault-tolerant server | |
CN105389242A (en) | Method for achieving batch acquisition of server information of whole cabinet | |
CN103281208B (en) | A kind of data backup & disaster recovery and comprehensive monitoring system | |
CN104461396B (en) | A kind of distributed storage extension framework based on fusion architecture | |
CN104076880B (en) | A kind of microserver | |
CN206460446U (en) | A kind of supervising device for ruggedized computer mainboard | |
CN105577752A (en) | Management system used for fusion framework server | |
CN103532728B (en) | A kind of method and device resetted to failure dsp chip | |
CN102541714B (en) | The implementation method of chip monitoring and device | |
CN109298687A (en) | Data monitoring method and device for robot operation | |
CN106292911A (en) | A kind of fusion architecture server | |
CN100547560C (en) | A kind of computers group monitoring and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |