TW201513605A - System and method for monitoring multi-level devices - Google Patents

System and method for monitoring multi-level devices Download PDF

Info

Publication number
TW201513605A
TW201513605A TW102127070A TW102127070A TW201513605A TW 201513605 A TW201513605 A TW 201513605A TW 102127070 A TW102127070 A TW 102127070A TW 102127070 A TW102127070 A TW 102127070A TW 201513605 A TW201513605 A TW 201513605A
Authority
TW
Taiwan
Prior art keywords
abnormal
node
name
software system
person
Prior art date
Application number
TW102127070A
Other languages
Chinese (zh)
Inventor
Chung-I Lee
Yi-Guo Wang
Jian Huang
Hong-Bo Liang
Zheng-Lai Ding
Qian-Cheng Ma
Original Assignee
Hon Hai Prec Ind Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hon Hai Prec Ind Co Ltd filed Critical Hon Hai Prec Ind Co Ltd
Publication of TW201513605A publication Critical patent/TW201513605A/en

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a system and method for monitoring multi-level devices. The system is configured for: scanning the devices and obtaining information of each device; determining if each device has an error according to information of the device; notifying a person in charge of the device in response to a determination that the device has the error; searching software application installed in the device and notifying a person in charge of the software application.

Description

多層級聯業務監控系統及方法Multi-layer cascade business monitoring system and method

本發明涉及一種虛擬機控制系統及方法,尤其是關於一種多層級聯業務監控系統及方法。The present invention relates to a virtual machine control system and method, and more particularly to a multi-layer cascaded service monitoring system and method.

資料中心(data center),通常包括幾台乃至上萬台設備,也稱為設備農場(server farm),指用於安置電腦系統及相關部件的設施,例如,電信和儲存系統。通常,資料中心包含冗餘和備用電源,冗餘資料通信連接,環境控制(例如空調、滅火器)和安全設備,其中,資料中心中最重要的設備為用於儲存資料的設備。所述設備上安裝有一個或多個虛擬機,每個虛擬機上還安裝有一個或多個軟體系統(例如,電子簽名系統等),為了確保資料中心上的設備及軟體的正常運行,需要對資料中心上的設備及軟體進行監控,然而,常規的監控一般集中在對設備和軟體直接監控上(即,一對一的監控),無法實現設備或軟體上部署的業務邏輯進行監控,即無法實現對相互關聯的設備的監控,也無法實現對相互關聯的軟體系統之間的監控。A data center, usually consisting of several or even tens of thousands of devices, also known as a server farm, refers to facilities used to house computer systems and related components, such as telecommunications and storage systems. Typically, data centers include redundant and backup power supplies, redundant data communication connections, environmental controls (such as air conditioners, fire extinguishers), and security devices, where the most important device in the data center is the device used to store data. One or more virtual machines are installed on the device, and one or more software systems (for example, an electronic signature system, etc.) are also installed on each virtual machine. To ensure the normal operation of devices and software on the data center, Monitoring devices and software on the data center. However, regular monitoring is generally focused on direct monitoring of devices and software (that is, one-to-one monitoring). It is impossible to monitor the business logic deployed on the device or software. Monitoring of interrelated devices is not possible, and monitoring between interconnected software systems is not possible.

鑒於以上內容,有必要提供一種多層級聯業務監控系統及方法,其可以實現對相互關聯的設備的監控,同時實現對相互關聯的軟體系統之間的監控,在資料中心的設備出現異常時及時通知相關負責人,讓相關負責人及時瞭解情況,從而縮短維護設備的等待時間。In view of the above, it is necessary to provide a multi-layer cascading service monitoring system and method, which can realize monitoring of interrelated devices, and at the same time realize monitoring between interconnected software systems, when the equipment in the data center is abnormal. Notify the responsible person and let the relevant person in charge know the situation in time to shorten the waiting time for maintenance equipment.

一種多層級聯業務監控系統,該系統運行於監控電腦中,該系統包括:掃描模組,用於掃描資料中心的設備,以獲取資料中心中每個設備的運行資訊;判斷模組,用於根據每個設備的運行資訊判斷資料中心上是否有設備出現異常;通知模組,用於當資料中心有設備出現異常時,通知管理上述異常設備的負責人;查找模組,用於根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人。A multi-layer cascading business monitoring system, the system running in a monitoring computer, the system comprising: a scanning module, configured to scan equipment of the data center to obtain operation information of each device in the data center; According to the operation information of each device, it is judged whether there is any abnormality in the data center; the notification module is used to notify the person in charge of managing the abnormal device when there is an abnormality in the data center; the search module is used according to the abnormal device. The name finds the software system affected by the abnormal device and notifies the person in charge associated with the software system.

一種多層級聯業務監控方法,該方法包括:掃描資料中心的設備,以獲取資料中心中每個設備的運行資訊;根據每個設備的運行資訊判斷資料中心上是否有設備出現異常;當資料中心有設備出現異常時,通知管理上述異常設備的負責人;根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人。A multi-layer cascading service monitoring method, the method comprising: scanning a device in a data center to obtain operation information of each device in the data center; determining, according to operation information of each device, whether a device is abnormal on the data center; When there is an abnormality in the device, the person in charge of managing the abnormal device is notified; the software system affected by the abnormal device is found according to the name of the abnormal device, and the person in charge associated with the software system is notified.

相較於習知技術,本發明提供的多層級聯業務監控系統及方法,其可以實現對相互關聯的設備的監控,同時實現對相互關聯的軟體系統之間的監控,在資料中心的設備出現異常時及時通知相關負責人,讓相關負責人及時瞭解情況,從而縮短維護設備的等待時間。Compared with the prior art, the present invention provides a multi-layer cascading service monitoring system and method, which can implement monitoring of interrelated devices, and at the same time realize monitoring between interconnected software systems, and devices in the data center appear. When the abnormality is notified, the relevant person in charge will be notified in time, so that the relevant responsible person can understand the situation in time, thereby shortening the waiting time for maintaining the equipment.

10‧‧‧用戶端10‧‧‧ Client

20‧‧‧監控電腦20‧‧‧Monitoring computer

30‧‧‧資料庫30‧‧‧Database

40‧‧‧網路40‧‧‧Network

50‧‧‧資料中心50‧‧‧Data Center

500‧‧‧設備500‧‧‧ equipment

200‧‧‧多層級聯業務監控系統200‧‧‧Multi-level cascading business monitoring system

210‧‧‧掃描模組210‧‧‧ scan module

220‧‧‧判斷模組220‧‧‧Judgement module

230‧‧‧通知模組230‧‧‧Notification module

240‧‧‧查找模組240‧‧‧Search module

250‧‧‧儲存器250‧‧‧Storage

260‧‧‧處理器260‧‧‧ processor

圖1係本發明多層級聯業務監控系統較佳實施例的應用環境圖。1 is an application environment diagram of a preferred embodiment of a multi-layer cascading service monitoring system of the present invention.

圖2係本發明監控電腦較佳實施例的結構示意圖。2 is a schematic structural view of a preferred embodiment of the monitoring computer of the present invention.

圖3係本發明多層級聯業務監控方法較佳實施例的流程圖。3 is a flow chart of a preferred embodiment of the multi-layer cascading service monitoring method of the present invention.

圖4係本發明多層級聯業務監控方法中步驟S40中根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人的細化流程圖。FIG. 4 is a detailed flowchart of the software system affected by the abnormal device according to the name of the abnormal device in step S40 of the multi-layer cascade service monitoring method of the present invention, and notifying the responsible person related to the software system.

圖5係本發明邏輯樹的示意圖。Figure 5 is a schematic illustration of the logic tree of the present invention.

參閱圖1所示,係本發明多層級聯業務監控系統200較佳實施例的應用環境圖。該多層級聯業務監控系統200應用於監控電腦20中。該監控電腦20與資料中心(Data Center)50透過網路40進行通信連接。Referring to FIG. 1, an application environment diagram of a preferred embodiment of the multi-layer cascading service monitoring system 200 of the present invention is shown. The multi-layer cascaded service monitoring system 200 is applied to the monitoring computer 20. The monitoring computer 20 is communicatively coupled to the data center (Data Center) 50 via the network 40.

所述網路40可以是網際網路、局域網路或者其他通訊網路。The network 40 can be an internet, a local area network, or other communication network.

所述資料中心50包括多個設備500(圖中以四個為例),所述設備500為伺服器。在本實施例中,所述伺服器稱為Host主機,每個Host主機上安裝有一個或多個虛擬機。所述虛擬機上安裝有一個或多個軟體系統(例如,證書驗證系統、密碼保護系統)。需要說明的是,所述軟體系統之間是以邏輯樹的方式關聯的,在邏輯樹中,每一個軟體系統的名稱對應一個節點,節點可以有子節點,多個節點可以擁有同一個子節點,如圖5所示為節點所組成的某一個邏輯樹,其中,節點A包括兩個子節點A1和A2,子節點A1進一步包括下一層的子節點A11,節點B包括兩個子節點B1和B2,子節點B2包括下一層的子節點B21,其中,A2和B1共同包括下一層的子節點C。根據邏輯樹的關係圖,可以透過某一個子節點一直找到最上層的節點,也可以一直找到最底層的節點。軟體系統可以安裝在同一個設備500的同一個虛擬機中,也可以安裝在同一個設備的不同虛擬機中,還可以安裝在不同設備500的虛擬機中,舉例而言,邏輯樹中A的兩個子節點A1和A2所對應的軟體系統可以安裝於同一個設備500的同一個虛擬機中,也可以安裝於同一設備500的不同虛擬機中,還可以分別安裝於兩個不同的設備500的虛擬機中。The data center 50 includes a plurality of devices 500 (four in the figure), and the device 500 is a server. In this embodiment, the server is called a host host, and one or more virtual machines are installed on each host host. One or more software systems (eg, a certificate verification system, a password protection system) are installed on the virtual machine. It should be noted that the software systems are associated in a logical tree manner. In the logic tree, each software system name corresponds to one node, the node may have child nodes, and multiple nodes may have the same child node. As shown in FIG. 5, a logical tree composed of nodes, wherein the node A includes two child nodes A1 and A2, the child node A1 further includes a child node A11 of the next layer, and the node B includes two child nodes B1 and B2, the child node B2 includes the child node B21 of the next layer, wherein A2 and B1 together comprise the child node C of the next layer. According to the relationship diagram of the logical tree, the uppermost node can be found through a certain child node, and the lowest level node can always be found. The software system can be installed in the same virtual machine of the same device 500, or can be installed in different virtual machines of the same device, and can also be installed in virtual machines of different devices 500. For example, in the logical tree, A The software systems corresponding to the two sub-nodes A1 and A2 may be installed in the same virtual machine of the same device 500, or may be installed in different virtual machines of the same device 500, or may be separately installed on two different devices 500. In the virtual machine.

所述監控電腦20用於監控資料中心50中設備500的運行情況,當設備500出現異常時,通知相關負責人(例如,負責管理資料中心50的負責人、負責管理設備500的負責人、負責在設備500上安裝虛擬機的負責人、負責對虛擬機上所安裝的軟體系統進行維護的負責人、及使用該軟體系統的用戶)。The monitoring computer 20 is configured to monitor the operation of the device 500 in the data center 50. When the device 500 is abnormal, notify the relevant person in charge (for example, the person in charge of managing the data center 50, the person in charge of the management device 500, and the person in charge) The person in charge of installing the virtual machine on the device 500, the person in charge of performing maintenance on the software system installed on the virtual machine, and the user using the software system).

此外,所述資料中心50還包括有一個或多個環境儀器(例如,溫度感測器、濕度感測器、風扇、變壓器、電壓電流偵測儀等儀器),所述監控電腦20還會監控所述環境儀器的運行情況,當環境儀器上的數值超過設定的值(例如,溫度感測器上的溫度超過七十度)或者環境儀器無法工作時,通知相關負責人(例如,負責管理資料中心50的負責人或負責維修環境儀器的負責人)。具體而言,監控電腦20與環境儀器透過簡單網路管理協定(SNMP,Simple Network Management Protocol)建立通信連接,並及時獲取環境儀器上的資料(例如,溫度、濕度、電流、電壓等),透過對所獲取的資料進行分析以判斷環境儀器是否出現異常。In addition, the data center 50 further includes one or more environmental instruments (for example, temperature sensors, humidity sensors, fans, transformers, voltage current detectors, etc.), and the monitoring computer 20 also monitors The operation of the environmental instrument, when the value on the environmental instrument exceeds a set value (for example, the temperature on the temperature sensor exceeds seventy degrees) or the environmental instrument fails to work, notify the relevant person in charge (for example, responsible for managing the data) The person in charge of the center 50 or the person in charge of the maintenance of environmental instruments). Specifically, the monitoring computer 20 establishes a communication connection with the environmental instrument through a Simple Network Management Protocol (SNMP), and timely acquires data on the environmental instrument (eg, temperature, humidity, current, voltage, etc.) through Analyze the acquired data to determine whether the environmental instrument is abnormal.

所述監控電腦20透過一個資料庫連接與資料庫30連接。其中,所述資料庫連接可為一開放式資料庫連接(Open Database Connectivity, ODBC),或Java資料庫連接(Java Database Connectivity, JDBC)。所述資料庫30用於儲存資料中心50中設備500的名稱、每個設備500上安裝的虛擬機名稱、每個虛擬機上安裝的軟體系統的名稱及邏輯樹。所述資料庫30還儲存有負責管理資料中心50的負責人的聯繫資訊、負責管理設備500的負責人的聯繫資訊、負責在設備500上安裝虛擬機的負責人的聯繫資訊、負責對虛擬機上所安裝的軟體系統進行維護的負責人的聯繫資訊、及使用該軟體系統的用戶的聯繫資訊。所述聯繫資訊包括,但不限於,郵箱位址、電話號碼等資訊。The monitoring computer 20 is coupled to the database 30 via a database connection. The database connection may be an Open Database Connectivity (ODBC) or a Java Database Connectivity (JDBC) connection. The database 30 is used to store the names of the devices 500 in the data center 50, the virtual machine names installed on each device 500, the names of the software systems installed on each virtual machine, and a logical tree. The database 30 also stores contact information of the person in charge of managing the data center 50, contact information of the person in charge of the management device 500, contact information of the person in charge of installing the virtual machine on the device 500, and responsibility for the virtual machine. Contact information of the person in charge of the software system installed on the maintenance, and contact information of the user who uses the software system. The contact information includes, but is not limited to, email address, phone number and the like.

在此需說明的是,資料庫30可獨立於監控電腦20,也可位於監控電腦20內。所述資料庫30可存於監控電腦20的硬碟或者快閃儲存器盤中。從系統安全性的角度考慮,本實施例中的資料庫30獨立於監控電腦20。It should be noted that the database 30 can be independent of the monitoring computer 20 or within the monitoring computer 20. The database 30 can be stored in a hard disk or flash storage disk of the monitoring computer 20. From the perspective of system security, the database 30 in this embodiment is independent of the monitoring computer 20.

此外,用戶端10用於提供一個互動式介面給用戶,便於用戶進行操作並將操作過程中的各種資料存於監控電腦20中。該用戶端10可以是個人電腦、筆記型電腦、手機、平板電腦以及其他任意能與監控電腦20連接的設備。In addition, the client 10 is used to provide an interactive interface to the user, which is convenient for the user to operate and store various data during the operation in the monitoring computer 20. The client 10 can be a personal computer, a notebook computer, a mobile phone, a tablet computer, and any other device that can be connected to the monitoring computer 20.

參閱圖2所示,係本發明監控電腦20較佳實施例的結構示意圖。該監控電腦20還包括儲存器250、處理器260。所述多層級聯業務監控系統200包括掃描模組210、判斷模組220、通知模組230及查找模組240。模組210至240的程式化代碼儲存於儲存器250中,處理器260執行這些程式化代碼,實現多層級聯業務監控系統200提供的上述功能。Referring to FIG. 2, it is a schematic structural view of a preferred embodiment of the monitoring computer 20 of the present invention. The monitoring computer 20 also includes a storage 250 and a processor 260. The multi-layer cascading service monitoring system 200 includes a scanning module 210, a determining module 220, a notification module 230, and a searching module 240. The stylized code of the modules 210-240 is stored in the storage 250, and the processor 260 executes the programmed code to implement the above-described functions provided by the multi-tier cascading service monitoring system 200.

所述掃描模組210用於掃描資料中心50的設備500,以獲取資料中心50中每個設備500的運行資訊。具體而言,掃描模組210調用HttpClient控制項,以獲取資料中心50中每個設備500的運行資訊,所述運行資訊包括CPU使用率、風扇速度狀態、硬碟使用率、硬碟狀態、記憶體狀態等資訊。The scanning module 210 is configured to scan the device 500 of the data center 50 to obtain running information of each device 500 in the data center 50. Specifically, the scanning module 210 calls the HttpClient control item to obtain running information of each device 500 in the data center 50, and the running information includes CPU usage, fan speed status, hard disk usage rate, hard disk status, and memory. Information such as body status.

所述判斷模組220用於根據每個設備500的運行資訊判斷資料中心50上是否有設備500出現異常。具體而言,根據獲取的每一個設備500的運行資訊與設備500正常運行時的運行資訊進行比較,從而判斷設備500是否出現異常。例如,若獲取某一個設備500的CPU使用率為95%,而該設備500正常運行時的CPU使用率要求低於85%,則判斷模組220判斷該設備500出現異常。此外,由於資料中心50中的設備500可能相互關聯,若有一個設備500出現異常,也會影響與其關聯的設備500的運行,因此,資料庫30中還儲存有設備500之間的多個關系列表,每個關系列表中包含多個相互關聯的設備500,在每個關系列表中,若某一個設備500出現異常,則該關系列表中的其他設備500也認定為出現異常。為了方便描述,出現異常的設備500稱為異常設備500。The determining module 220 is configured to determine, according to the running information of each device 500, whether there is an abnormality in the device 500 on the data center 50. Specifically, the running information of each device 500 obtained is compared with the running information of the device 500 during normal operation, thereby determining whether the device 500 is abnormal. For example, if the CPU usage rate of a certain device 500 is 95%, and the CPU usage requirement of the device 500 during normal operation is less than 85%, the determination module 220 determines that the device 500 is abnormal. In addition, since the devices 500 in the data center 50 may be associated with each other, if an abnormality occurs in one of the devices 500, the operation of the device 500 associated therewith is also affected. Therefore, the database 30 also stores multiple relationships between the devices 500. A list, each relationship list includes a plurality of interrelated devices 500. In each relationship list, if an abnormality occurs in one of the devices 500, the other devices 500 in the relationship list are also considered to be abnormal. For convenience of description, the device 500 in which an abnormality occurs is referred to as an abnormal device 500.

所述通知模組230用於通知管理上述異常設備500的負責人。具體而言,通知模組230從資料庫30中透過該異常設備500的名稱,找到該負責管理該異常設備500的負責人的聯繫資訊,並用郵件或短信的方式發送提示資訊給該負責人。The notification module 230 is configured to notify the person in charge of managing the abnormal device 500. Specifically, the notification module 230 finds the contact information of the person in charge responsible for managing the abnormal device 500 from the database 30 through the name of the abnormal device 500, and sends the prompt information to the responsible person by mail or short message.

所述查找模組240用於根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人。所述根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人的具體實施方式將在圖4中做詳細描述。The searching module 240 is configured to find a software system affected by the abnormal device according to the name of the abnormal device, and notify a responsible person associated with the software system. The specific implementation manner of finding a software system affected by the abnormal device according to the name of the abnormal device and notifying the person in charge associated with the software system will be described in detail in FIG.

參閱圖3所示,係本發明多層級聯業務監控方法較佳實施例的流程圖。Referring to FIG. 3, it is a flowchart of a preferred embodiment of the multi-layer cascading service monitoring method of the present invention.

步驟S10,掃描模組210掃描資料中心50的設備500,以獲取資料中心50中每個設備500的運行資訊。具體而言,掃描模組210調用HttpClient控制項,以獲取資料中心50中每個設備500的運行資訊,所述運行資訊包括CPU使用率、風扇速度狀態、硬碟使用率、硬碟狀態、記憶體狀態等資訊。In step S10, the scanning module 210 scans the device 500 of the data center 50 to obtain the running information of each device 500 in the data center 50. Specifically, the scanning module 210 calls the HttpClient control item to obtain running information of each device 500 in the data center 50, and the running information includes CPU usage, fan speed status, hard disk usage rate, hard disk status, and memory. Information such as body status.

步驟S20,判斷模組220根據每個設備500的運行資訊判斷資料中心50上是否有設備500出現異常。此外,由於資料中心50中的設備500可能相互關聯,若有一個設備500出現異常,也會影響與其關聯的設備500的運行,因此,資料庫30中還儲存有設備500之間的多個關系列表,每個關系列表中包含多個相互關聯的設備500,在每個關系列表中,若某一個設備500出現異常,則該關系列表中的其他設備500也認定為出現異常。為了方便描述,出現異常的設備500稱為異常設備500。。若資料中心50沒有設備500出現異常,流程返回步驟S10。若判斷資料中心50有設備500出現異常,流程進入步驟S30。In step S20, the determining module 220 determines, according to the running information of each device 500, whether there is an abnormality in the device 500 on the data center 50. In addition, since the devices 500 in the data center 50 may be associated with each other, if an abnormality occurs in one of the devices 500, the operation of the device 500 associated therewith is also affected. Therefore, the database 30 also stores multiple relationships between the devices 500. A list, each relationship list includes a plurality of interrelated devices 500. In each relationship list, if an abnormality occurs in one of the devices 500, the other devices 500 in the relationship list are also considered to be abnormal. For convenience of description, the device 500 in which an abnormality occurs is referred to as an abnormal device 500. . If the data center 50 does not have an abnormality in the device 500, the flow returns to step S10. If it is determined that the data center 50 has an abnormality in the device 500, the flow proceeds to step S30.

步驟S30,通知模組230通知管理上述異常設備500的負責人。具體而言,通知模組230從資料庫30中透過該異常設備500的名稱,找到該負責管理該異常設備500的負責人的聯繫資訊,並用郵件或短信的方式發送提示資訊給該負責人。In step S30, the notification module 230 notifies the person in charge of managing the abnormal device 500. Specifically, the notification module 230 finds the contact information of the person in charge responsible for managing the abnormal device 500 from the database 30 through the name of the abnormal device 500, and sends the prompt information to the responsible person by mail or short message.

步驟S40,查找模組240根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人。所述根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人的具體實施方式將在圖4中做詳細描述。In step S40, the search module 240 searches for the software system affected by the abnormal device according to the name of the abnormal device, and notifies the person in charge associated with the software system. The specific implementation manner of finding a software system affected by the abnormal device according to the name of the abnormal device and notifying the person in charge associated with the software system will be described in detail in FIG.

參閱圖4所示,是本發明圖3的步驟S40中根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人的細化流程圖。Referring to FIG. 4, in the step S40 of FIG. 3, the software system affected by the abnormal device is found according to the name of the abnormal device, and the detailed flowchart of the person in charge associated with the software system is notified.

步驟S410,查找模組240根據異常設備500的名稱獲取該異常設備500上軟體系統的名稱。具體而言,查找模組240在資料庫30中查找異常設備500上的軟體系統的名稱。In step S410, the search module 240 acquires the name of the software system on the abnormal device 500 according to the name of the abnormal device 500. Specifically, the lookup module 240 looks up the name of the software system on the abnormal device 500 in the database 30.

步驟S420,查找模組240根據該異常設備500上軟體系統的名稱及邏輯樹獲取其他受影響的軟體系統的名稱。根據邏輯樹獲取其他受影響的軟體系統的名稱的方式如下:第一步:在邏輯樹中找到該異常設備500上軟體系統的名稱所對應的節點,第二步:向上查找,即獲取該節點上一層的節點,並透過遞迴的方式一直獲取到該節點對應的根節點,第三步:向下查找,即獲取該節點下一層的節點,並透過遞迴的方式一直獲取到該節點最底層的節點,由於每個節點對應一個軟體系統的名稱,透過所查找到的節點即可獲取其他受影響的軟體系統的名稱。舉例而言,如圖5所示,假設獲取的軟體系統的名稱在邏輯樹中對應的節點為A2,則查找模組240根據邏輯樹找到該節點A2上一層的節點A,及該節點A2下一層的節點C,需要說明的是,若節點A還有上一層節點,則透過遞迴的方式繼續向上查找,直到所獲取的節點沒有上一層節點,同理可知,若節點C還有下一層節點,則透過遞迴的方式繼續向下查找,直到所獲取的節點沒有下一層節點。In step S420, the search module 240 obtains the names of other affected software systems according to the name and logical tree of the software system on the abnormal device 500. The manner of obtaining the names of other affected software systems according to the logic tree is as follows: Step 1: Find the node corresponding to the name of the software system on the abnormal device 500 in the logic tree, and the second step: look up, that is, obtain the node. The node of the upper layer is obtained by the recursive method to obtain the root node corresponding to the node. The third step is to search downward, that is, to obtain the node of the next layer of the node, and obtain the most The underlying nodes, because each node corresponds to the name of a software system, the names of other affected software systems can be obtained through the found nodes. For example, as shown in FIG. 5, if the name of the acquired software system is A2 in the logical tree, the search module 240 finds the node A of the layer above the node A2 according to the logic tree, and the node A2 Node C of a layer, it should be noted that if node A has a node above, it will continue to look up through the recursive method until the acquired node has no upper node. Similarly, if node C has the next layer. The node continues to look down through the recursive method until the acquired node has no next-level node.

步驟S430,查找模組240根據所有獲取的軟體系統的名稱通知管理上述軟體系統的負責人。具體而言,假設查找到的節點為A、A2及C,則通知管理上述三個節點所對應的軟體系統的負責人。In step S430, the search module 240 notifies the person in charge of managing the software system according to the names of all acquired software systems. Specifically, if the found nodes are A, A2, and C, the person in charge of managing the software system corresponding to the above three nodes is notified.

最後所應說明的是,以上實施例僅用以說明本發明的技術方案而非限制,儘管參照以上較佳實施例對本發明進行了詳細說明,本領域的普通技術人員應當理解,可以對本發明的技術方案進行修改或等同替換,而不脫離本發明技術方案的精神和範圍。It should be noted that the above embodiments are only intended to illustrate the technical solutions of the present invention and are not intended to be limiting, and the present invention will be described in detail with reference to the preferred embodiments thereof The technical solutions are modified or equivalently substituted without departing from the spirit and scope of the technical solutions of the present invention.

no

10‧‧‧用戶端 10‧‧‧ Client

20‧‧‧監控電腦 20‧‧‧Monitoring computer

30‧‧‧資料庫 30‧‧‧Database

40‧‧‧網路 40‧‧‧Network

50‧‧‧資料中心 50‧‧‧Data Center

500‧‧‧設備 500‧‧‧ equipment

200‧‧‧多層級聯業務監控系統 200‧‧‧Multi-level cascading business monitoring system

Claims (12)

一種多層級聯業務監控系統,該系統運行於監控電腦中,該系統包括:
掃描模組,用於掃描資料中心的設備,以獲取資料中心中每個設備的運行資訊;
判斷模組,用於根據每個設備的運行資訊判斷資料中心是否有設備出現異常;
通知模組,用於當資料中心有設備出現異常時,通知管理上述異常設備的負責人;及
查找模組,用於根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人。
A multi-layer cascading business monitoring system, the system running in a monitoring computer, the system comprising:
a scanning module for scanning data center equipment to obtain operation information of each device in the data center;
The determining module is configured to determine, according to the running information of each device, whether the device has an abnormality in the data center;
The notification module is configured to notify the person in charge of managing the abnormal device when the device in the data center is abnormal; and the search module is configured to find the software system affected by the abnormal device according to the name of the abnormal device, and notify the The person in charge of the software system.
如申請專利範圍第1項所述之多層級聯業務監控系統,所述設備的運行資訊包括CPU使用率、風扇速度狀態、硬碟使用率、硬碟狀態及記憶體狀態。For example, in the multi-layer cascading service monitoring system described in claim 1, the running information of the device includes CPU usage, fan speed status, hard disk usage rate, hard disk status, and memory status. 如申請專利範圍第1項所述之多層級聯業務監控系統,所述判斷設備出現異常的方式是根據獲取的每一個設備的運行資訊與設備正常運行時的運行資訊進行比較,從而判斷設備是否出現異常。For example, in the multi-layer cascading service monitoring system described in claim 1, the method for determining that the device is abnormal is based on comparing the obtained running information of each device with the running information of the device during normal operation, thereby determining whether the device is Abnormal. 如申請專利範圍第1項所述之多層級聯業務監控系統,所述資料庫中儲存有多個關系列表,每個關系列表中包含多個相互關聯的設備,在每個關系列表中,若某一個設備出現異常,則該關系列表中的其他設備也認定為出現異常。The multi-layer cascading service monitoring system of claim 1, wherein the database stores a plurality of relationship lists, each relationship list includes a plurality of interrelated devices, and in each relationship list, If an abnormality occurs in one of the devices, the other devices in the relationship list are also considered to be abnormal. 如申請專利範圍第1項所述之多層級聯業務監控系統,所述根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人的具體實施方式如下:
根據異常設備的名稱獲取該異常設備上軟體系統的名稱;
根據該異常設備上軟體系統的名稱及邏輯樹獲取其他受影響的軟體系統的名稱;及
根據所有獲取的軟體系統的名稱通知管理上述軟體系統的負責人。
The multi-layer cascading service monitoring system according to claim 1, wherein the software system affected by the abnormal device is found according to the name of the abnormal device, and the specific implementation manner of the person in charge related to the software system is notified. as follows:
Obtain the name of the software system on the abnormal device according to the name of the abnormal device;
Obtaining the names of other affected software systems according to the name and logical tree of the software system on the abnormal device; and notifying the person in charge of managing the above software system according to the names of all acquired software systems.
如申請專利範圍第5項所述之多層級聯業務監控系統,所述根據該異常設備上軟體系統的名稱及邏輯樹獲取其他受影響的軟體系統的名稱的方式如下:
在邏輯樹中找到該異常設備上軟體系統的名稱所對應的節點,該邏輯樹中的每個節點對應一個軟體系統的名稱;
透過遞迴方式向上查找,獲取該節點上一層的節點,直到獲取到該節點對應的根節點;及
透過遞迴方式向下查找,獲取該節點下一層的節點,直到獲取到該節點最底層的節點。
For example, in the multi-layer cascading service monitoring system described in claim 5, the manner of obtaining the names of other affected software systems according to the name and logic tree of the software system on the abnormal device is as follows:
Find the node corresponding to the name of the software system on the abnormal device in the logic tree, and each node in the logical tree corresponds to the name of a software system;
Looking up through the recursive method, obtaining the node on the upper layer of the node until the root node corresponding to the node is obtained; and searching through the recursive manner to obtain the node below the node until the bottom layer of the node is obtained. node.
一種多層級聯業務監控方法,該方法包括:
掃描資料中心的設備,以獲取資料中心中每個設備的運行資訊;
根據每個設備的運行資訊判斷資料中心是否有設備出現異常;
當資料中心有設備出現異常時,通知管理上述異常設備的負責人;及
根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人。
A multi-layer cascading service monitoring method, the method comprising:
Scan the data center equipment to obtain information about the operation of each device in the data center;
Judging whether there is any abnormality in the data center according to the operation information of each device;
When there is an abnormality in the equipment in the data center, the person in charge of managing the abnormal device is notified; and the software system affected by the abnormal device is found according to the name of the abnormal device, and the person in charge related to the software system is notified.
如申請專利範圍第7項所述之多層級聯業務監控方法,所述設備的運行資訊包括CPU使用率、風扇速度狀態、硬碟使用率、硬碟狀態及記憶體狀態。For example, in the multi-layer cascading service monitoring method described in claim 7, the running information of the device includes CPU usage, fan speed status, hard disk usage rate, hard disk status, and memory status. 如申請專利範圍第7項所述之多層級聯業務監控方法,所述判斷設備出現異常的方式是根據獲取的每一個設備的運行資訊與設備正常運行時的運行資訊進行比較,從而判斷設備是否出現異常。The multi-layer cascading service monitoring method described in claim 7 is characterized in that the device is abnormal according to the obtained running information of each device and the running information of the device during normal operation, thereby determining whether the device is abnormal. Abnormal. 如申請專利範圍第7項所述之多層級聯業務監控方法,所述資料庫中儲存有多個關系列表,每個關系列表中包含多個相互關聯的設備,在每個關系列表中,若某一個設備出現異常,則該關系列表中的其他設備也認定為出現異常。The multi-layer cascading service monitoring method described in claim 7, wherein the database stores a plurality of relationship lists, each relationship list includes a plurality of interrelated devices, and in each relationship list, If an abnormality occurs in one of the devices, the other devices in the relationship list are also considered to be abnormal. 如申請專利範圍第7項所述之多層級聯業務監控方法,所述根據異常設備的名稱查找到受該異常設備影響的軟體系統,並通知與所述軟體系統相關的負責人的具體實施方式如下:
根據異常設備的名稱獲取該異常設備上軟體系統的名稱;
根據該異常設備上軟體系統的名稱及邏輯樹獲取其他受影響的軟體系統的名稱;及
根據所有獲取的軟體系統的名稱通知管理上述軟體系統的負責人。
The multi-layer cascading service monitoring method according to claim 7, wherein the software system affected by the abnormal device is found according to the name of the abnormal device, and the specific implementation manner of the person in charge related to the software system is notified. as follows:
Obtain the name of the software system on the abnormal device according to the name of the abnormal device;
Obtaining the names of other affected software systems according to the name and logical tree of the software system on the abnormal device; and notifying the person in charge of managing the above software system according to the names of all acquired software systems.
如申請專利範圍第11項所述之多層級聯業務監控方法,所述根據該異常設備上軟體系統的名稱及邏輯樹獲取其他受影響的軟體系統的名稱的方式如下:
在邏輯樹中找到該異常設備上軟體系統的名稱所對應的節點,該邏輯樹中的每個節點對應一個軟體系統的名稱;
透過遞迴方式向上查找,獲取該節點上一層的節點,直到獲取到該節點對應的根節點;及
透過遞迴方式向下查找,獲取該節點下一層的節點,直到獲取到該節點最底層的節點。
For the multi-layer cascading service monitoring method described in claim 11, the method for obtaining the names of other affected software systems according to the name and logic tree of the software system on the abnormal device is as follows:
Find the node corresponding to the name of the software system on the abnormal device in the logic tree, and each node in the logical tree corresponds to the name of a software system;
Looking up through the recursive method, obtaining the node on the upper layer of the node until the root node corresponding to the node is obtained; and searching through the recursive manner to obtain the node below the node until the bottom layer of the node is obtained. node.
TW102127070A 2013-06-28 2013-07-29 System and method for monitoring multi-level devices TW201513605A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310263788.0A CN104253715A (en) 2013-06-28 2013-06-28 Monitoring system and method of multi-level cascade business

Publications (1)

Publication Number Publication Date
TW201513605A true TW201513605A (en) 2015-04-01

Family

ID=52188281

Family Applications (1)

Application Number Title Priority Date Filing Date
TW102127070A TW201513605A (en) 2013-06-28 2013-07-29 System and method for monitoring multi-level devices

Country Status (2)

Country Link
CN (1) CN104253715A (en)
TW (1) TW201513605A (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106598764B (en) * 2015-10-14 2019-12-03 北京国双科技有限公司 Abnormality eliminating method and device
CN109558292A (en) * 2017-09-26 2019-04-02 阿里巴巴集团控股有限公司 A kind of monitoring method and device
CN108628720A (en) * 2018-05-02 2018-10-09 济南浪潮高新科技投资发展有限公司 Equipment monitoring system and method under a kind of cascade scene
CN111490900B (en) * 2020-03-30 2022-12-16 中移(杭州)信息技术有限公司 Gateway fault positioning method and device and gateway equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547120A (en) * 2003-12-10 2004-11-17 沈阳东软软件股份有限公司 Network monitoring management system
EP2262172A1 (en) * 2009-06-10 2010-12-15 Alcatel Lucent Method and scout agent for building a source database
CN101778017B (en) * 2010-01-05 2012-05-23 中国工商银行股份有限公司 Method and server for processing on-line transaction fault event of mainframe
CN102693177B (en) * 2011-03-23 2015-02-04 ***通信集团公司 Fault diagnosing and processing methods of virtual machine as well as device and system thereof

Also Published As

Publication number Publication date
CN104253715A (en) 2014-12-31

Similar Documents

Publication Publication Date Title
US11057266B2 (en) Identifying troubleshooting options for resolving network failures
US8880907B2 (en) Method and system for determining physical location of equipment
US9071535B2 (en) Comparing node states to detect anomalies
US10860311B2 (en) Method and apparatus for drift management in clustered environments
WO2020029407A1 (en) Alarm data management method and apparatus, and computer device and storage medium
JP4410804B2 (en) System management method, information processing apparatus and program in distributed network environment
US20170269983A1 (en) Method and apparatus for managing device failure
JP5542398B2 (en) Root cause analysis result display method, apparatus and system for failure
CN107547595B (en) Cloud resource scheduling system, method and device
JP2009048611A (en) Method and apparatus for generating configuration rules for computing entities within computing environment using association rule mining
US11656928B2 (en) Detecting datacenter mass outage with near real-time/offline using ml models
US11329869B2 (en) Self-monitoring
US8949653B1 (en) Evaluating high-availability configuration
US10185614B2 (en) Generic alarm correlation by means of normalized alarm codes
CN113535474B (en) Method, system, medium and terminal for automatically repairing heterogeneous cloud storage cluster fault
TW201513605A (en) System and method for monitoring multi-level devices
US20140282581A1 (en) Method and apparatus for providing a component block architecture
CN113076112A (en) Database deployment method and device and electronic equipment
CN112000539A (en) Inspection method and device
CN108023783A (en) network equipment monitoring system and method
JP2011180805A (en) Apparatus, method and program for operation management
WO2022134352A1 (en) Server hardware state monitoring method and apparatus, electronic device, and medium
JP2018045475A (en) Computer system and method of managing computer system
CN115801588A (en) Dynamic topology processing method and system for network connection
TW202325070A (en) Method and apparatus related to network analysis