CN113946497A - Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources - Google Patents
Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources Download PDFInfo
- Publication number
- CN113946497A CN113946497A CN202111028927.2A CN202111028927A CN113946497A CN 113946497 A CN113946497 A CN 113946497A CN 202111028927 A CN202111028927 A CN 202111028927A CN 113946497 A CN113946497 A CN 113946497A
- Authority
- CN
- China
- Prior art keywords
- alarm
- monitoring
- alarming
- management
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources, which comprises the following steps: step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise; according to the cloud platform monitoring and alarming system, the integrated intelligent monitoring and alarming platform is adopted to uniformly and comprehensively control and analyze the use condition of the cloud platform, so that the monitoring and alarming are simpler; meanwhile, the data acquisition source is enlarged, so that the cloud monitoring index information is more perfect, and the accuracy of alarming is improved.
Description
Technical Field
The invention relates to the technical field of cloud platforms, in particular to a method suitable for unified intelligent monitoring and alarming of multiple cloud platform resources.
Background
Cloud monitoring refers to a monitoring service that provides usability, user experience, and security aspects for network, system, application, and the like content. The method aims to ensure stable and safe operation of the service of the cloud computing user; in cloud computing platforms, monitoring and management of resources is a very important ring to ensure platform reliability. The method is not only used for conveniently providing an effective management scheme on the basis of resource monitoring so as to improve the resource utilization rate, but also can ensure that the fault can be detected most timely when the fault occurs and adopts a most effective method to solve the fault.
In a traditional cloud data center, when data are collected, the collected data range is narrow; moreover, the alarm mode is single, and the accuracy of monitoring alarm is influenced; therefore, the method suitable for the unified intelligent monitoring alarm of the multi-cloud platform resources is designed.
Disclosure of Invention
The invention solves the problem of providing a method suitable for the unified intelligent monitoring alarm of the multi-cloud-platform resources, and the method adopts an integrated intelligent monitoring alarm platform to carry out unified and comprehensive management and control and analysis on the use condition of the cloud platform, so that the monitoring alarm is simpler; meanwhile, the data acquisition source is enlarged, so that the cloud monitoring index information is more perfect, and the accuracy of alarming is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources comprises the following steps:
step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise;
step S2: the integrated intelligent monitoring and warning platform comprises a data acquisition layer, a data processing layer and a data display layer;
step S3: the data acquisition layer acquires required cloud monitoring index information from the managed equipment through a plurality of network protocols including SNMP/SNMP Trap, Telnet, SSH, WMI, JDBC, Syslog, open API and the like, and the acquired data is put into a cache for analysis and operation and then stored in a database for an upper platform to analyze and display;
step S4: the data processing layer comprises: a plurality of subsystems such as resource monitoring, service flow management, configuration management, asset management, operation and maintenance big data analysis and the like; the system comprises one or more DCS (distributed control systems), is used for receiving data acquired by each DCS, and provides performance data basis for the display of the front end through analysis and mining processing of various acquired data; generating fault alarm to the data display layer when the index threshold is exceeded;
step S5: the data presentation layer includes: providing role-dividing and visual data display and management by using a Web technology; the IT resource environment is comprehensively managed through the functions of business management, resource management, topology management, routing inspection management, alarm management and the like, a large amount of statistical and analytical data and display pages are provided, and the requirement of daily work is met; the method comprises the steps of providing alarm monitoring for a network, providing a plurality of data integration modes with a cloud platform and a dynamic environment system, mining relevant information such as services, indexes and faults according to a historical data change rule by using a large data platform component operation and maintenance index evaluation analysis model, helping to find problem root improvement points, presenting integrated monitoring information and alarm information in the platform, and performing service correlation analysis and alarm correlation analysis;
step S6: a Nagios open source monitoring system is selected for alarming;
the conditions involved in the alarm include: striking alarm, alarm threshold and alarm period;
when the alarm is triggered, log writing operation is required, and when the alarm item cannot be started, an alarm failure prompt is sent to a user.
As a further scheme of the invention: in step S2, the platform is designed in a modular manner, the modules are loosely coupled, the new module can be directly connected to the platform, and the modules communicate with each other through an interface and a message queue.
As a further scheme of the invention: in step S3, the range included in the cloud monitoring index information includes monitoring of the server itself and performance of the web site;
the monitoring of the server itself includes: CPU utilization rate, CPU load, memory utilization rate, disk space utilization rate, disk I/O, network flow, system process number, process CPU/memory/state monitoring, service monitoring and log monitoring;
the capabilities for the web site include: site URL/HTTP availability and response time, UDP/TCP port availability and response time, POP3/SMTP/FTP port availability and response time.
As a further scheme of the invention: in step S5, the user may configure policies such as Syslog and SNMP Trap receiving, filtering, and alarming, view the received information of Syslog and SNMP Trap, and manually synchronize the device information index, so that the monitoring data is timely and accurate.
As a further scheme of the invention: in step S5, all backbone network devices, subnets, and interconnection relationships of the entire network may be displayed in a visual cleaning manner through topology management; the hierarchical network display conforms to the network logic structure and is associated with the Syslog alarm information of the equipment, and the equipment alarm information is convenient for fault isolation and quick positioning; the topology management provides functions of intuitive 2D machine room topology management, automatic map topology management, IP-MAC-PORT, real panel management and the like, integrates Telnet, SSH, TraceRT and Ping remote operation and maintenance tools, and facilitates remote control of IT resources.
As a further scheme of the invention: in step S6, the warning includes:
1) obtaining an alarm through a webpage console;
2) receiving an alarm through Email;
3) and receiving an alarm through a short message of the mobile phone.
The invention has the beneficial effects that: according to the cloud platform monitoring and alarming system, the integrated intelligent monitoring and alarming platform is adopted to uniformly and comprehensively control and analyze the use condition of the cloud platform, so that the monitoring and alarming are simpler; meanwhile, the data acquisition source is enlarged, so that the cloud monitoring index information is more perfect, and the accuracy of alarming is improved.
Drawings
FIG. 1 is a schematic structural diagram of an intelligent monitoring and warning platform according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Specific examples are given below.
Referring to fig. 1, a method for unified intelligent monitoring and alarming of resources of a multi-cloud platform includes the following steps:
step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise;
step S2: the integrated intelligent monitoring and warning platform comprises a data acquisition layer, a data processing layer and a data display layer;
the platform adopts a modular design, the modules are loosely coupled, a new module can be directly accessed to the platform, and the modules communicate with each other through an interface and a message queue;
step S3: the data acquisition layer acquires required cloud monitoring index information from the managed equipment through a plurality of network protocols including SNMP/SNMP Trap, Telnet, SSH, WMI, JDBC, Syslog, open API and the like, and the acquired data is put into a cache for analysis and operation and then stored in a database for an upper platform to analyze and display;
the range contained in the cloud monitoring index information comprises the monitoring of the server and the performance of the web site;
the monitoring of the server itself includes: CPU utilization rate, CPU load, memory utilization rate, disk space utilization rate, disk I/O, network flow, system process number, process CPU/memory/state monitoring, service monitoring and log monitoring;
the capabilities for the web site include: site URL/HTTP availability and response time, UDP/TCP port availability and response time, POP3/SMTP/FTP port availability and response time;
the accuracy of the alarm is improved through abundant and detailed data acquisition;
step S4: the data processing layer comprises: a plurality of subsystems such as resource monitoring, service flow management, configuration management, asset management, operation and maintenance big data analysis and the like; the system comprises one or more DCS (distributed control systems), is used for receiving data acquired by each DCS, and provides performance data basis for the display of the front end through analysis and mining processing of various acquired data; generating fault alarm to the data display layer when the index threshold is exceeded;
step S5: the data presentation layer includes: providing role-dividing and visual data display and management by using a Web technology; the IT resource environment is comprehensively managed through the functions of business management, resource management, topology management, routing inspection management, alarm management and the like, a large amount of statistical and analytical data and display pages are provided, and the requirement of daily work is met; the method comprises the steps of providing alarm monitoring for a network, providing a plurality of data integration modes with a cloud platform and a dynamic environment system, mining relevant information such as services, indexes and faults according to a historical data change rule by using a large data platform component operation and maintenance index evaluation analysis model, helping to find problem root improvement points, presenting integrated monitoring information and alarm information in the platform, and performing service correlation analysis and alarm correlation analysis;
the user can configure strategies such as receiving, filtering and alarming of the Syslog and the SNMP Trap, check the received information of the Syslog and the SNMP Trap and manually synchronize equipment information indexes, so that monitoring data are timely and accurate;
wherein, all backbone network equipment, subnets and interconnection relations of the whole network can be displayed in a visual cleaning way through topology management; the hierarchical network display conforms to the network logic structure and is associated with the Syslog alarm information of the equipment, and the equipment alarm information is convenient for fault isolation and quick positioning; the functions of visual 2D machine room topology management, automatic map topology management, IP-MAC-PORT, real panel management and the like are provided in the topology management, Telnet, SSH, TraceRT and Ping remote operation and maintenance tools are integrated, and IT resources are conveniently controlled remotely;
step S6: a Nagios open source monitoring system is selected for alarming;
wherein, the warning includes:
1) obtaining an alarm through a webpage console;
2) receiving an alarm through Email;
3) receiving an alarm through a short message of a mobile phone;
the conditions involved in the alarm include: striking alarm, alarm threshold and alarm period;
when the alarm is triggered, log writing operation is required, and when the alarm item cannot be started, an alarm failure prompt is sent to a user.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (6)
1. A method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources is characterized by comprising the following steps:
step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise;
step S2: the integrated intelligent monitoring and warning platform comprises a data acquisition layer, a data processing layer and a data display layer;
step S3: the data acquisition layer acquires required cloud monitoring index information from the managed equipment through a plurality of network protocols including SNMP/SNMP Trap, Telnet, SSH, WMI, JDBC, Syslog, open API and the like, and the acquired data is put into a cache for analysis and operation and then stored in a database for an upper platform to analyze and display;
step S4: the data processing layer comprises: a plurality of subsystems such as resource monitoring, service flow management, configuration management, asset management, operation and maintenance big data analysis and the like; the system comprises one or more DCS (distributed control systems), is used for receiving data acquired by each DCS, and provides performance data basis for the display of the front end through analysis and mining processing of various acquired data; generating fault alarm to the data display layer when the index threshold is exceeded;
step S5: the data presentation layer includes: providing role-dividing and visual data display and management by using a Web technology; the IT resource environment is comprehensively managed through the functions of business management, resource management, topology management, routing inspection management, alarm management and the like, a large amount of statistical and analytical data and display pages are provided, and the requirement of daily work is met; the method comprises the steps of providing alarm monitoring for a network, providing a plurality of data integration modes with a cloud platform and a dynamic environment system, mining relevant information such as services, indexes and faults according to a historical data change rule by using a large data platform component operation and maintenance index evaluation analysis model, helping to find problem root improvement points, presenting integrated monitoring information and alarm information in the platform, and performing service correlation analysis and alarm correlation analysis;
step S6: a Nagios open source monitoring system is selected for alarming;
the conditions involved in the alarm include: striking alarm, alarm threshold and alarm period;
when the alarm is triggered, log writing operation is required, and when the alarm item cannot be started, an alarm failure prompt is sent to a user.
2. The method for unified intelligent monitoring and alarming for resources of multiple cloud platforms according to claim 1, wherein in step S2, the platforms are designed in a modular manner, the modules are loosely coupled, a new module can be directly connected to the platforms, and the modules communicate with each other through interfaces and message queues.
3. The method for unified intelligent monitoring and alarming for resources of multiple cloud platforms according to claim 1, wherein in the step S3, the range included in the cloud monitoring index information includes monitoring of the server itself and performance of the web site;
the monitoring of the server itself includes: CPU utilization rate, CPU load, memory utilization rate, disk space utilization rate, disk I/O, network flow, system process number, process CPU/memory/state monitoring, service monitoring and log monitoring;
the capabilities for the web site include: site URL/HTTP availability and response time, UDP/TCP port availability and response time, POP3/SMTP/FTP port availability and response time.
4. The method for unified intelligent monitoring alarm of multi-cloud platform resource according to claim 1, wherein in step S5, the user can configure the receiving, filtering, alarm and other policies of Syslog and SNMP Trap, view the information of the received Syslog and SNMP Trap, and manually synchronize the device information index, so that the monitoring data is timely and accurate.
5. The method for unified intelligent monitoring and alarming for resources of multiple cloud platforms according to claim 1, wherein in step S5, all backbone network devices, subnets and interconnection relationships in the whole network are displayed in a manner of visual cleaning through topology management; the hierarchical network display conforms to the network logic structure and is associated with the Syslog alarm information of the equipment, and the equipment alarm information is convenient for fault isolation and quick positioning; the topology management provides functions of intuitive 2D machine room topology management, automatic map topology management, IP-MAC-PORT, real panel management and the like, integrates Telnet, SSH, TraceRT and Ping remote operation and maintenance tools, and facilitates remote control of IT resources.
6. The method for unified intelligent monitoring alarm of multi-cloud platform resource according to claim 1, wherein in the step S6, the alarm includes:
1) obtaining an alarm through a webpage console;
2) receiving an alarm through Email;
3) and receiving an alarm through a short message of the mobile phone.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111028927.2A CN113946497A (en) | 2021-09-03 | 2021-09-03 | Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111028927.2A CN113946497A (en) | 2021-09-03 | 2021-09-03 | Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113946497A true CN113946497A (en) | 2022-01-18 |
Family
ID=79327820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111028927.2A Pending CN113946497A (en) | 2021-09-03 | 2021-09-03 | Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113946497A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114710505A (en) * | 2022-04-02 | 2022-07-05 | 杭州云象网络技术有限公司 | Method and system for realizing ecological safety supervision of digital RMB (national currency) based on block chain |
CN114978856A (en) * | 2022-05-11 | 2022-08-30 | 北京辛诺创新科技有限公司 | Multi-cloud computing management platform and method |
CN115865622A (en) * | 2022-11-25 | 2023-03-28 | 南方电网数字平台科技(广东)有限公司 | Multi-cloud monitoring and alarming method and device |
CN116166505A (en) * | 2023-02-22 | 2023-05-26 | 优维科技(深圳)有限公司 | Monitoring platform, method, storage medium and equipment for dual-state IT architecture in financial industry |
WO2023142054A1 (en) * | 2022-01-27 | 2023-08-03 | 中远海运科技股份有限公司 | Container microservice-oriented performance monitoring and alarm method and alarm system |
CN117033158A (en) * | 2023-10-09 | 2023-11-10 | 深圳市金众工程检验检测有限公司 | Comprehensive performance monitoring method based on cloud platform |
-
2021
- 2021-09-03 CN CN202111028927.2A patent/CN113946497A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023142054A1 (en) * | 2022-01-27 | 2023-08-03 | 中远海运科技股份有限公司 | Container microservice-oriented performance monitoring and alarm method and alarm system |
CN114710505A (en) * | 2022-04-02 | 2022-07-05 | 杭州云象网络技术有限公司 | Method and system for realizing ecological safety supervision of digital RMB (national currency) based on block chain |
CN114978856A (en) * | 2022-05-11 | 2022-08-30 | 北京辛诺创新科技有限公司 | Multi-cloud computing management platform and method |
CN115865622A (en) * | 2022-11-25 | 2023-03-28 | 南方电网数字平台科技(广东)有限公司 | Multi-cloud monitoring and alarming method and device |
CN116166505A (en) * | 2023-02-22 | 2023-05-26 | 优维科技(深圳)有限公司 | Monitoring platform, method, storage medium and equipment for dual-state IT architecture in financial industry |
CN116166505B (en) * | 2023-02-22 | 2023-09-26 | 优维科技(深圳)有限公司 | Monitoring platform, method, storage medium and equipment for dual-state IT architecture in financial industry |
CN117033158A (en) * | 2023-10-09 | 2023-11-10 | 深圳市金众工程检验检测有限公司 | Comprehensive performance monitoring method based on cloud platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113946497A (en) | Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources | |
CN104506393B (en) | A kind of system monitoring method based on cloud platform | |
US11616703B2 (en) | Scalable visualization of health data for network devices | |
CN102447570B (en) | Monitoring device and method based on health degree analysis | |
CN104407964B (en) | A kind of centralized monitoring system and method based on data center | |
WO2019233047A1 (en) | Power grid dispatching-based operation and maintenance method | |
CN105282772A (en) | Wireless network data communication equipment monitoring system and equipment monitoring method | |
CN103295155B (en) | Security core service system method for supervising | |
CN103716173B (en) | A kind of method for storing monitoring system and monitoring alarm issue | |
CN105183609A (en) | Real-time monitoring system and method applied to software system | |
CN102523140A (en) | Real-time monitoring device for operation and maintenance of electric power customer service system | |
CN104637265A (en) | Dispatch-automated multilevel integration intelligent watching alarming system | |
CN111488258A (en) | System for analyzing and early warning software and hardware running state | |
CN111083230A (en) | Computer network operation management system | |
WO2015192664A1 (en) | Device monitoring method and apparatus | |
Safrianti et al. | Real-time network device monitoring system with simple network management protocol (SNMP) model | |
CN103973484A (en) | Operation and maintenance management system based on network topological structure | |
US11558242B2 (en) | Generation of synthetic alerts and unified dashboard for viewing multiple layers of data center simultaneously | |
US20230198860A1 (en) | Systems and methods for the temporal monitoring and visualization of network health of direct interconnect networks | |
CN114885014A (en) | Method, device, equipment and medium for monitoring external field equipment state | |
US11425011B2 (en) | System and method for real time monitoring a plurality of network devices | |
CN108599978A (en) | A kind of cloud monitoring method and device | |
CN111817865A (en) | Method for monitoring network management equipment and monitoring system | |
CN203911977U (en) | Cross-network monitoring system for information equipment | |
CN109614292A (en) | Host operation data automatic collection monitoring system based on shell |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |