CN105426286A - System for monitoring whole rack server - Google Patents

System for monitoring whole rack server Download PDF

Info

Publication number
CN105426286A
CN105426286A CN201510745328.0A CN201510745328A CN105426286A CN 105426286 A CN105426286 A CN 105426286A CN 201510745328 A CN201510745328 A CN 201510745328A CN 105426286 A CN105426286 A CN 105426286A
Authority
CN
China
Prior art keywords
whole machine
machine cabinet
module
cabinet server
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510745328.0A
Other languages
Chinese (zh)
Other versions
CN105426286B (en
Inventor
王恩东
胡雷钧
黄家明
乔英良
李冠广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201510745328.0A priority Critical patent/CN105426286B/en
Publication of CN105426286A publication Critical patent/CN105426286A/en
Application granted granted Critical
Publication of CN105426286B publication Critical patent/CN105426286B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a system for monitoring a whole rack server. The system comprises a data acquisition module, a data processing module, a control module and a power module, wherein the data acquisition module is used for acquiring running data of a running state of the whole rack server and storing the acquired running data in an internal cache for data access by the data processing module; the data processing module is connected with the data acquisition module and used for obtaining the running data and storing the running data in the internal cache for data access by the control module; and the control module is connected with the data processing module and the power module and is used for obtaining the running data and power information of the power module to monitor the whole rack server in real time. According to the system for monitoring the whole rack server, the timeliness of monitoring each device of the whole rack server is improved, device faults can be timely and effectively discovered and handled, and the availability and reliability of the whole rack server are improved.

Description

A kind of system that whole machine cabinet server is monitored
Technical field
The present invention relates to server technology field, particularly relate to a kind of system that whole machine cabinet server is monitored.
Background technology
Along with user is to the raising of the performance requirement of computing machine, user gets more and more to the quantity required of server.SmartRack whole machine cabinet server has very large advantage at node density with on TCO compared with traditional server, and applies more and more extensive in practice.
SmartRack whole machine cabinet server is at equipment such as the integrated computing node of interior of equipment cabinet, memory node, fan, power supplys.The equipment such as computing node, memory node, fan, power supply have self FW, can carry out the monitoring of equipment of itself.Because SmartRack whole machine cabinet internal unit is various, interface and the communication protocol of equipment are different, adopt the monitoring framework of traditional server, by the information of the inner all devices of BMC monitoring cabinet, from the time or the complexity managed all can not meet the monitoring demand of equipment cabinet server.
Summary of the invention
The object of this invention is to provide a kind of system that whole machine cabinet server is monitored, to monitor in real time whole machine cabinet server, thus discovering device fault and handling failure timely and effectively, improve availability and the reliability of whole machine cabinet server.
For solving the problems of the technologies described above, the invention provides a kind of system that whole machine cabinet server is monitored, comprising:
Data acquisition module, data processing module, control module and power module;
Wherein, the described service data collected for gathering the service data of whole machine cabinet operation condition of server, and is kept at inner buffer for described data processing module and carries out data access by described data acquisition module;
Described data processing module is connected with described data acquisition module, for obtaining described service data, and described service data is kept at inner buffer for described control module and carries out data access;
Described control module is connected with described data processing module, described power module, for obtaining the power information of described service data and described power module, to monitor in real time described whole machine cabinet server.
Alternatively, described data acquisition module comprises:
First collecting unit, for gathering the state of computing node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the computing node collected;
Second collecting unit, for gathering the state of memory node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the memory node collected;
3rd collecting unit, for gathering the fan information of described whole machine cabinet server fan running status.
Alternatively, described control module, specifically for monitoring the failure message of the described service data got, described power module and status information, carries out malfunction monitoring to described whole machine cabinet server.
Alternatively, described 3rd collecting unit is specifically for obtaining described fan information by fan control board.
Alternatively, described data processing module is specially plate in node.
Alternatively, described fan control board is connected with plate in described node by I2C bus.
Alternatively, described first collecting unit, described second collecting unit are connected with plate in described node by IPMB bus.
Alternatively, described control module is connected with plate in described node by I2C bus, serial ports or netting twine.
Alternatively, described control module is connected with described power module by I2C bus.
The system monitored whole machine cabinet server provided by the present invention, by the service data of data collecting module collected whole machine cabinet operation condition of server, and is kept at inner buffer for data processing module and carries out data access by the service data collected; Data processing module obtains this service data, and service data is kept at inner buffer for control module and carries out data access; Control module obtains the power information of this service data and power module, to monitor in real time whole machine cabinet server.The system monitored whole machine cabinet server provided by the present invention, improves the real-time to each monitoring of tools of whole machine cabinet server, can discovering device fault and handling failure timely and effectively, improves availability and the reliability of whole machine cabinet server.
Accompanying drawing explanation
Fig. 1 is the structured flowchart of a kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention;
Fig. 2 is the schematic diagram of the another kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention.
Embodiment
In order to make those skilled in the art person understand the present invention program better, below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.Obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
As shown in Figure 1, this system comprises the structured flowchart of a kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention:
Data acquisition module 1, data processing module 2, control module 3 and power module 4;
Wherein, the described service data collected for gathering the service data of whole machine cabinet operation condition of server, and is kept at inner buffer for described data processing module and carries out data access by described data acquisition module 1;
Described data processing module 2 is connected with described data acquisition module 1, for obtaining described service data, and described service data is kept at inner buffer for described control module and carries out data access;
Described control module 3 is connected with described data processing module 2, power module 4, for obtaining the power information of described service data and described power module, to monitor in real time described whole machine cabinet server.
The system monitored whole machine cabinet server provided by the present invention, by the service data of data collecting module collected whole machine cabinet operation condition of server, and is kept at inner buffer for data processing module and carries out data access by the service data collected; Data processing module obtains this service data, and service data is kept at inner buffer for control module and carries out data access; Control module obtains the power information of this service data and power module, to monitor in real time whole machine cabinet server.The system monitored whole machine cabinet server provided by the present invention, improves the real-time to each monitoring of tools of whole machine cabinet server, can discovering device fault and handling failure timely and effectively, improves availability and the reliability of whole machine cabinet server.
As a kind of embodiment, above-mentioned data acquisition module 1 can specifically comprise:
First collecting unit, for gathering the state of computing node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the computing node collected;
Second collecting unit, for gathering the state of memory node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the memory node collected;
3rd collecting unit, for gathering the fan information of described whole machine cabinet server fan running status.
Particularly, the 3rd collecting unit can obtain described fan information by fan control board.
As a kind of embodiment, above-mentioned control module 3 specifically for monitoring the failure message of the described service data got, described power module and status information, can carry out malfunction monitoring to described whole machine cabinet server.
The schematic diagram of the another kind of embodiment to the system that whole machine cabinet server is monitored provided by the present invention as shown in Figure 2, in this specific embodiment, is described for SmartRack whole machine cabinet server.
In the present embodiment, the first collecting unit, the second collecting unit by the mode of BMC and EMC traditionally rack, are monitored computing node and memory node respectively respectively, and data processing module adopts plate in node to realize, and control module is specifically realized by RMC.
Particularly, BMC is used for monitoring calculation node state, monitor data is kept at inner buffer and carries out data access for plate in node.Monitor data, for monitoring memory node state, is kept at inner buffer and carries out data access for plate in node by EMC.
In node, plate obtains computing node status information by BMC, obtains memory node status information by EMC, and the fan control board that access is connected with plate in node obtains fan information and carries out rotation speed of the fan control.Computing node, memory node and fan information are kept at inner buffer for RMC and carry out data access the most at last.
RMC obtains whole machine cabinet server computing node, memory node, fan and power information by plate in node, and the fan that is directly directly connected with RMC of poll and power information.Unified interface is externally provided, realizes the monitoring to SmartRack whole machine cabinet information.
Particularly, the inside of SmartRack whole machine cabinet server can be divided into different tray.Each tray has plate in node to monitor as the secondary of cabinet equipment, and the fan control board of tray inside is connected in node on plate by I2C bus, and simultaneously BMC and EMC to be connected in node on plate by IPMB bus.These information are saved in inner buffer by fan, nodal information in node on plate travel all over Tray simultaneously.
In the present embodiment, RMC, as the Surveillance center of rack, is connected to power module by I2C bus, to be connected in node on plate by I2C, serial ports or netting twine.Power module failure information and status information is obtained by I2C.Fan, the nodal information on Tray is obtained by plate in access node.
Visible, the present invention provides service as the Surveillance center of whole machine cabinet server to user using RMC, in node, plate is as the secondary monitoring framework of equipment cabinet server, the monitoring of the computing node being responsible for being attached thereto, memory node, fan, the monitoring that BMC is responsible for the monitoring of computing node, EMC is responsible for memory node, achieve three grades of monitoring to SmartRack whole machine cabinet server, make user convenient as access traditional server BMC by RMC access whole machine cabinet monitor message.Invention increases the real-time that RMC monitors the equipment such as SmartRack whole machine cabinet internal calculation node, memory node, fan and power supply, checkout equipment fault can be processed timely and effectively to go forward side by side row relax, improve availability and the reliability of whole machine cabinet server.
In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiment, between each embodiment same or similar part mutually see.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (9)

1. to the system that whole machine cabinet server is monitored, it is characterized in that, comprising:
Data acquisition module, data processing module, control module and power module;
Wherein, the described service data collected for gathering the service data of whole machine cabinet operation condition of server, and is kept at inner buffer for described data processing module and carries out data access by described data acquisition module;
Described data processing module is connected with described data acquisition module, for obtaining described service data, and described service data is kept at inner buffer for described control module and carries out data access;
Described control module is connected with described data processing module, described power module, for obtaining the power information of described service data and described power module, to monitor in real time described whole machine cabinet server.
2. the system as claimed in claim 1 whole machine cabinet server monitored, it is characterized in that, described data acquisition module comprises:
First collecting unit, for gathering the state of computing node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the computing node collected;
Second collecting unit, for gathering the state of memory node in described whole machine cabinet server, and is kept at inner buffer for described data processing module and carries out data access by the status information of the memory node collected;
3rd collecting unit, for gathering the fan information of described whole machine cabinet server fan running status.
3. the system as claimed in claim 2 whole machine cabinet server monitored, it is characterized in that, described control module, specifically for monitoring the failure message of the described service data got, described power module and status information, carries out malfunction monitoring to described whole machine cabinet server.
4. the system monitored whole machine cabinet server as claimed in claim 3, it is characterized in that, described 3rd collecting unit is specifically for obtaining described fan information by fan control board.
5. the system monitored whole machine cabinet server as claimed in claim 4, it is characterized in that, described data processing module is specially plate in node.
6. the system monitored whole machine cabinet server as claimed in claim 5, is characterized in that, described fan control board is connected with plate in described node by I2C bus.
7. the system monitored whole machine cabinet server as claimed in claim 6, is characterized in that, described first collecting unit, described second collecting unit are connected with plate in described node by IPMB bus.
8. the system monitored whole machine cabinet server as claimed in claim 7, is characterized in that, described control module is connected with plate in described node by I2C bus, serial ports or netting twine.
9. the system monitored whole machine cabinet server as described in any one of claim 1 to 8, is characterized in that, described control module is connected with described power module by I2C bus.
CN201510745328.0A 2015-11-05 2015-11-05 A kind of system being monitored to whole machine cabinet server Active CN105426286B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510745328.0A CN105426286B (en) 2015-11-05 2015-11-05 A kind of system being monitored to whole machine cabinet server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510745328.0A CN105426286B (en) 2015-11-05 2015-11-05 A kind of system being monitored to whole machine cabinet server

Publications (2)

Publication Number Publication Date
CN105426286A true CN105426286A (en) 2016-03-23
CN105426286B CN105426286B (en) 2018-05-04

Family

ID=55504504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510745328.0A Active CN105426286B (en) 2015-11-05 2015-11-05 A kind of system being monitored to whole machine cabinet server

Country Status (1)

Country Link
CN (1) CN105426286B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371958A (en) * 2016-08-31 2017-02-01 浪潮电子信息产业股份有限公司 Server fault diagnosis system and method
CN106598810A (en) * 2016-12-16 2017-04-26 中国航空工业集团公司洛阳电光设备研究所 Multi-CPU airborne data processing unit BIT monitoring architecture
CN107239385A (en) * 2017-06-06 2017-10-10 郑州云海信息技术有限公司 A kind of server and instruction lamp control method
CN107239346A (en) * 2017-06-09 2017-10-10 郑州云海信息技术有限公司 A kind of whole machine cabinet computing resource tank node and computing resource pond framework
CN107248940A (en) * 2017-06-12 2017-10-13 郑州云海信息技术有限公司 A kind of whole machine cabinet monitoring management module, whole machine cabinet server and data center
CN107422276A (en) * 2017-07-31 2017-12-01 郑州云海信息技术有限公司 Device and method is surveyed in a kind of power cabinet physical examination
CN107543987A (en) * 2017-08-30 2018-01-05 郑州云海信息技术有限公司 A kind of Smart Rack condition monitoring systems and monitoring method
CN107977273A (en) * 2016-10-25 2018-05-01 郑州云海信息技术有限公司 The Memory Optimize Method of node information collection memory sharing in a kind of cabinet
CN108763022A (en) * 2018-05-28 2018-11-06 深圳市瑞驰信息技术有限公司 A kind of intelligent-platform management interface system based on I2C agreements
CN109101400A (en) * 2018-08-16 2018-12-28 郑州云海信息技术有限公司 A kind of monitoring system of cloud computation data center whole machine cabinet server
CN109586994A (en) * 2018-11-01 2019-04-05 郑州云海信息技术有限公司 A kind of whole machine cabinet server burn-in test monitoring method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287395A1 (en) * 2009-05-06 2010-11-11 Via Technologies, Inc. Computer system for processing data in non-operational state and processing method thereof
CN102495785A (en) * 2011-12-23 2012-06-13 创新科存储技术(深圳)有限公司 Centralized management method and device for servers of whole equipment cabinet
CN104820479A (en) * 2015-04-24 2015-08-05 北京百度网讯科技有限公司 Controlling method and device for whole cabinet server fan

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287395A1 (en) * 2009-05-06 2010-11-11 Via Technologies, Inc. Computer system for processing data in non-operational state and processing method thereof
CN102495785A (en) * 2011-12-23 2012-06-13 创新科存储技术(深圳)有限公司 Centralized management method and device for servers of whole equipment cabinet
CN104820479A (en) * 2015-04-24 2015-08-05 北京百度网讯科技有限公司 Controlling method and device for whole cabinet server fan

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371958A (en) * 2016-08-31 2017-02-01 浪潮电子信息产业股份有限公司 Server fault diagnosis system and method
CN107977273A (en) * 2016-10-25 2018-05-01 郑州云海信息技术有限公司 The Memory Optimize Method of node information collection memory sharing in a kind of cabinet
CN106598810A (en) * 2016-12-16 2017-04-26 中国航空工业集团公司洛阳电光设备研究所 Multi-CPU airborne data processing unit BIT monitoring architecture
CN107239385A (en) * 2017-06-06 2017-10-10 郑州云海信息技术有限公司 A kind of server and instruction lamp control method
CN107239346A (en) * 2017-06-09 2017-10-10 郑州云海信息技术有限公司 A kind of whole machine cabinet computing resource tank node and computing resource pond framework
CN107248940A (en) * 2017-06-12 2017-10-13 郑州云海信息技术有限公司 A kind of whole machine cabinet monitoring management module, whole machine cabinet server and data center
CN107422276A (en) * 2017-07-31 2017-12-01 郑州云海信息技术有限公司 Device and method is surveyed in a kind of power cabinet physical examination
CN107543987A (en) * 2017-08-30 2018-01-05 郑州云海信息技术有限公司 A kind of Smart Rack condition monitoring systems and monitoring method
CN108763022A (en) * 2018-05-28 2018-11-06 深圳市瑞驰信息技术有限公司 A kind of intelligent-platform management interface system based on I2C agreements
CN109101400A (en) * 2018-08-16 2018-12-28 郑州云海信息技术有限公司 A kind of monitoring system of cloud computation data center whole machine cabinet server
CN109586994A (en) * 2018-11-01 2019-04-05 郑州云海信息技术有限公司 A kind of whole machine cabinet server burn-in test monitoring method and system

Also Published As

Publication number Publication date
CN105426286B (en) 2018-05-04

Similar Documents

Publication Publication Date Title
CN105426286A (en) System for monitoring whole rack server
CN105389244B (en) A kind of server monitoring method and device
Caulfield et al. A cloud-scale acceleration architecture
CN102567227B (en) Double-controller memory system and method for sharing cache equipment
CN102929769B (en) Virtual machine internal-data acquisition method based on agency service
CN105472291B (en) The digital hard disc video recorder and its implementation of multiprocessor cluster
CN108156225B (en) Micro-application monitoring system and method based on container cloud platform
CN102346707B (en) Server system and operation method thereof
CN105808499A (en) CPU interconnection device and multichannel server CPU interconnection topological structure
CN105373462A (en) Whole cabinet server management method and system
CN103716173A (en) Storage monitoring system and monitoring alarm issuing method
EP3123272A1 (en) Systems and methods for monitoring a configuration of ups groups with different redundancy levels
WO2015192664A1 (en) Device monitoring method and apparatus
CN105577430A (en) Node management method of high-end fault-tolerant server
CN105389242A (en) Method for achieving batch acquisition of server information of whole cabinet
CN103281208B (en) A kind of data backup & disaster recovery and comprehensive monitoring system
CN104461396B (en) A kind of distributed storage extension framework based on fusion architecture
CN104076880B (en) A kind of microserver
CN206460446U (en) A kind of supervising device for ruggedized computer mainboard
CN105577752A (en) Management system used for fusion framework server
CN103532728B (en) A kind of method and device resetted to failure dsp chip
CN102541714B (en) The implementation method of chip monitoring and device
CN109298687A (en) Data monitoring method and device for robot operation
CN106292911A (en) A kind of fusion architecture server
CN100547560C (en) A kind of computers group monitoring and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant