CN113946497A - Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources - Google Patents

Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources Download PDF

Info

Publication number
CN113946497A
CN113946497A CN202111028927.2A CN202111028927A CN113946497A CN 113946497 A CN113946497 A CN 113946497A CN 202111028927 A CN202111028927 A CN 202111028927A CN 113946497 A CN113946497 A CN 113946497A
Authority
CN
China
Prior art keywords
alarm
monitoring
alarming
management
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111028927.2A
Other languages
Chinese (zh)
Inventor
李济伟
王怀宇
来风刚
李伟良
李岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202111028927.2A priority Critical patent/CN113946497A/en
Publication of CN113946497A publication Critical patent/CN113946497A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources, which comprises the following steps: step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise; according to the cloud platform monitoring and alarming system, the integrated intelligent monitoring and alarming platform is adopted to uniformly and comprehensively control and analyze the use condition of the cloud platform, so that the monitoring and alarming are simpler; meanwhile, the data acquisition source is enlarged, so that the cloud monitoring index information is more perfect, and the accuracy of alarming is improved.

Description

Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources
Technical Field
The invention relates to the technical field of cloud platforms, in particular to a method suitable for unified intelligent monitoring and alarming of multiple cloud platform resources.
Background
Cloud monitoring refers to a monitoring service that provides usability, user experience, and security aspects for network, system, application, and the like content. The method aims to ensure stable and safe operation of the service of the cloud computing user; in cloud computing platforms, monitoring and management of resources is a very important ring to ensure platform reliability. The method is not only used for conveniently providing an effective management scheme on the basis of resource monitoring so as to improve the resource utilization rate, but also can ensure that the fault can be detected most timely when the fault occurs and adopts a most effective method to solve the fault.
In a traditional cloud data center, when data are collected, the collected data range is narrow; moreover, the alarm mode is single, and the accuracy of monitoring alarm is influenced; therefore, the method suitable for the unified intelligent monitoring alarm of the multi-cloud platform resources is designed.
Disclosure of Invention
The invention solves the problem of providing a method suitable for the unified intelligent monitoring alarm of the multi-cloud-platform resources, and the method adopts an integrated intelligent monitoring alarm platform to carry out unified and comprehensive management and control and analysis on the use condition of the cloud platform, so that the monitoring alarm is simpler; meanwhile, the data acquisition source is enlarged, so that the cloud monitoring index information is more perfect, and the accuracy of alarming is improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources comprises the following steps:
step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise;
step S2: the integrated intelligent monitoring and warning platform comprises a data acquisition layer, a data processing layer and a data display layer;
step S3: the data acquisition layer acquires required cloud monitoring index information from the managed equipment through a plurality of network protocols including SNMP/SNMP Trap, Telnet, SSH, WMI, JDBC, Syslog, open API and the like, and the acquired data is put into a cache for analysis and operation and then stored in a database for an upper platform to analyze and display;
step S4: the data processing layer comprises: a plurality of subsystems such as resource monitoring, service flow management, configuration management, asset management, operation and maintenance big data analysis and the like; the system comprises one or more DCS (distributed control systems), is used for receiving data acquired by each DCS, and provides performance data basis for the display of the front end through analysis and mining processing of various acquired data; generating fault alarm to the data display layer when the index threshold is exceeded;
step S5: the data presentation layer includes: providing role-dividing and visual data display and management by using a Web technology; the IT resource environment is comprehensively managed through the functions of business management, resource management, topology management, routing inspection management, alarm management and the like, a large amount of statistical and analytical data and display pages are provided, and the requirement of daily work is met; the method comprises the steps of providing alarm monitoring for a network, providing a plurality of data integration modes with a cloud platform and a dynamic environment system, mining relevant information such as services, indexes and faults according to a historical data change rule by using a large data platform component operation and maintenance index evaluation analysis model, helping to find problem root improvement points, presenting integrated monitoring information and alarm information in the platform, and performing service correlation analysis and alarm correlation analysis;
step S6: a Nagios open source monitoring system is selected for alarming;
the conditions involved in the alarm include: striking alarm, alarm threshold and alarm period;
when the alarm is triggered, log writing operation is required, and when the alarm item cannot be started, an alarm failure prompt is sent to a user.
As a further scheme of the invention: in step S2, the platform is designed in a modular manner, the modules are loosely coupled, the new module can be directly connected to the platform, and the modules communicate with each other through an interface and a message queue.
As a further scheme of the invention: in step S3, the range included in the cloud monitoring index information includes monitoring of the server itself and performance of the web site;
the monitoring of the server itself includes: CPU utilization rate, CPU load, memory utilization rate, disk space utilization rate, disk I/O, network flow, system process number, process CPU/memory/state monitoring, service monitoring and log monitoring;
the capabilities for the web site include: site URL/HTTP availability and response time, UDP/TCP port availability and response time, POP3/SMTP/FTP port availability and response time.
As a further scheme of the invention: in step S5, the user may configure policies such as Syslog and SNMP Trap receiving, filtering, and alarming, view the received information of Syslog and SNMP Trap, and manually synchronize the device information index, so that the monitoring data is timely and accurate.
As a further scheme of the invention: in step S5, all backbone network devices, subnets, and interconnection relationships of the entire network may be displayed in a visual cleaning manner through topology management; the hierarchical network display conforms to the network logic structure and is associated with the Syslog alarm information of the equipment, and the equipment alarm information is convenient for fault isolation and quick positioning; the topology management provides functions of intuitive 2D machine room topology management, automatic map topology management, IP-MAC-PORT, real panel management and the like, integrates Telnet, SSH, TraceRT and Ping remote operation and maintenance tools, and facilitates remote control of IT resources.
As a further scheme of the invention: in step S6, the warning includes:
1) obtaining an alarm through a webpage console;
2) receiving an alarm through Email;
3) and receiving an alarm through a short message of the mobile phone.
The invention has the beneficial effects that: according to the cloud platform monitoring and alarming system, the integrated intelligent monitoring and alarming platform is adopted to uniformly and comprehensively control and analyze the use condition of the cloud platform, so that the monitoring and alarming are simpler; meanwhile, the data acquisition source is enlarged, so that the cloud monitoring index information is more perfect, and the accuracy of alarming is improved.
Drawings
FIG. 1 is a schematic structural diagram of an intelligent monitoring and warning platform according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Specific examples are given below.
Referring to fig. 1, a method for unified intelligent monitoring and alarming of resources of a multi-cloud platform includes the following steps:
step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise;
step S2: the integrated intelligent monitoring and warning platform comprises a data acquisition layer, a data processing layer and a data display layer;
the platform adopts a modular design, the modules are loosely coupled, a new module can be directly accessed to the platform, and the modules communicate with each other through an interface and a message queue;
step S3: the data acquisition layer acquires required cloud monitoring index information from the managed equipment through a plurality of network protocols including SNMP/SNMP Trap, Telnet, SSH, WMI, JDBC, Syslog, open API and the like, and the acquired data is put into a cache for analysis and operation and then stored in a database for an upper platform to analyze and display;
the range contained in the cloud monitoring index information comprises the monitoring of the server and the performance of the web site;
the monitoring of the server itself includes: CPU utilization rate, CPU load, memory utilization rate, disk space utilization rate, disk I/O, network flow, system process number, process CPU/memory/state monitoring, service monitoring and log monitoring;
the capabilities for the web site include: site URL/HTTP availability and response time, UDP/TCP port availability and response time, POP3/SMTP/FTP port availability and response time;
the accuracy of the alarm is improved through abundant and detailed data acquisition;
step S4: the data processing layer comprises: a plurality of subsystems such as resource monitoring, service flow management, configuration management, asset management, operation and maintenance big data analysis and the like; the system comprises one or more DCS (distributed control systems), is used for receiving data acquired by each DCS, and provides performance data basis for the display of the front end through analysis and mining processing of various acquired data; generating fault alarm to the data display layer when the index threshold is exceeded;
step S5: the data presentation layer includes: providing role-dividing and visual data display and management by using a Web technology; the IT resource environment is comprehensively managed through the functions of business management, resource management, topology management, routing inspection management, alarm management and the like, a large amount of statistical and analytical data and display pages are provided, and the requirement of daily work is met; the method comprises the steps of providing alarm monitoring for a network, providing a plurality of data integration modes with a cloud platform and a dynamic environment system, mining relevant information such as services, indexes and faults according to a historical data change rule by using a large data platform component operation and maintenance index evaluation analysis model, helping to find problem root improvement points, presenting integrated monitoring information and alarm information in the platform, and performing service correlation analysis and alarm correlation analysis;
the user can configure strategies such as receiving, filtering and alarming of the Syslog and the SNMP Trap, check the received information of the Syslog and the SNMP Trap and manually synchronize equipment information indexes, so that monitoring data are timely and accurate;
wherein, all backbone network equipment, subnets and interconnection relations of the whole network can be displayed in a visual cleaning way through topology management; the hierarchical network display conforms to the network logic structure and is associated with the Syslog alarm information of the equipment, and the equipment alarm information is convenient for fault isolation and quick positioning; the functions of visual 2D machine room topology management, automatic map topology management, IP-MAC-PORT, real panel management and the like are provided in the topology management, Telnet, SSH, TraceRT and Ping remote operation and maintenance tools are integrated, and IT resources are conveniently controlled remotely;
step S6: a Nagios open source monitoring system is selected for alarming;
wherein, the warning includes:
1) obtaining an alarm through a webpage console;
2) receiving an alarm through Email;
3) receiving an alarm through a short message of a mobile phone;
the conditions involved in the alarm include: striking alarm, alarm threshold and alarm period;
when the alarm is triggered, log writing operation is required, and when the alarm item cannot be started, an alarm failure prompt is sent to a user.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (6)

1. A method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources is characterized by comprising the following steps:
step S1: the intelligent monitoring and alarming are carried out by adopting an integrated intelligent monitoring and alarming platform, the intelligent monitoring and alarming platform adopts Portal service, CCS service and DCS service to be deployed on the same or different host computers according to the actual condition of an IT environment, and simultaneously, single or multiple DCS can be adopted to carry out management capacity planning according to the scale of a management object, so that two centralized or distributed deployment modes with different requirements are realized, and flexible management is realized on IT resources with different structures such as an internal network, an external network, a headquarters, a branch and the like of an enterprise;
step S2: the integrated intelligent monitoring and warning platform comprises a data acquisition layer, a data processing layer and a data display layer;
step S3: the data acquisition layer acquires required cloud monitoring index information from the managed equipment through a plurality of network protocols including SNMP/SNMP Trap, Telnet, SSH, WMI, JDBC, Syslog, open API and the like, and the acquired data is put into a cache for analysis and operation and then stored in a database for an upper platform to analyze and display;
step S4: the data processing layer comprises: a plurality of subsystems such as resource monitoring, service flow management, configuration management, asset management, operation and maintenance big data analysis and the like; the system comprises one or more DCS (distributed control systems), is used for receiving data acquired by each DCS, and provides performance data basis for the display of the front end through analysis and mining processing of various acquired data; generating fault alarm to the data display layer when the index threshold is exceeded;
step S5: the data presentation layer includes: providing role-dividing and visual data display and management by using a Web technology; the IT resource environment is comprehensively managed through the functions of business management, resource management, topology management, routing inspection management, alarm management and the like, a large amount of statistical and analytical data and display pages are provided, and the requirement of daily work is met; the method comprises the steps of providing alarm monitoring for a network, providing a plurality of data integration modes with a cloud platform and a dynamic environment system, mining relevant information such as services, indexes and faults according to a historical data change rule by using a large data platform component operation and maintenance index evaluation analysis model, helping to find problem root improvement points, presenting integrated monitoring information and alarm information in the platform, and performing service correlation analysis and alarm correlation analysis;
step S6: a Nagios open source monitoring system is selected for alarming;
the conditions involved in the alarm include: striking alarm, alarm threshold and alarm period;
when the alarm is triggered, log writing operation is required, and when the alarm item cannot be started, an alarm failure prompt is sent to a user.
2. The method for unified intelligent monitoring and alarming for resources of multiple cloud platforms according to claim 1, wherein in step S2, the platforms are designed in a modular manner, the modules are loosely coupled, a new module can be directly connected to the platforms, and the modules communicate with each other through interfaces and message queues.
3. The method for unified intelligent monitoring and alarming for resources of multiple cloud platforms according to claim 1, wherein in the step S3, the range included in the cloud monitoring index information includes monitoring of the server itself and performance of the web site;
the monitoring of the server itself includes: CPU utilization rate, CPU load, memory utilization rate, disk space utilization rate, disk I/O, network flow, system process number, process CPU/memory/state monitoring, service monitoring and log monitoring;
the capabilities for the web site include: site URL/HTTP availability and response time, UDP/TCP port availability and response time, POP3/SMTP/FTP port availability and response time.
4. The method for unified intelligent monitoring alarm of multi-cloud platform resource according to claim 1, wherein in step S5, the user can configure the receiving, filtering, alarm and other policies of Syslog and SNMP Trap, view the information of the received Syslog and SNMP Trap, and manually synchronize the device information index, so that the monitoring data is timely and accurate.
5. The method for unified intelligent monitoring and alarming for resources of multiple cloud platforms according to claim 1, wherein in step S5, all backbone network devices, subnets and interconnection relationships in the whole network are displayed in a manner of visual cleaning through topology management; the hierarchical network display conforms to the network logic structure and is associated with the Syslog alarm information of the equipment, and the equipment alarm information is convenient for fault isolation and quick positioning; the topology management provides functions of intuitive 2D machine room topology management, automatic map topology management, IP-MAC-PORT, real panel management and the like, integrates Telnet, SSH, TraceRT and Ping remote operation and maintenance tools, and facilitates remote control of IT resources.
6. The method for unified intelligent monitoring alarm of multi-cloud platform resource according to claim 1, wherein in the step S6, the alarm includes:
1) obtaining an alarm through a webpage console;
2) receiving an alarm through Email;
3) and receiving an alarm through a short message of the mobile phone.
CN202111028927.2A 2021-09-03 2021-09-03 Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources Pending CN113946497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111028927.2A CN113946497A (en) 2021-09-03 2021-09-03 Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111028927.2A CN113946497A (en) 2021-09-03 2021-09-03 Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources

Publications (1)

Publication Number Publication Date
CN113946497A true CN113946497A (en) 2022-01-18

Family

ID=79327820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111028927.2A Pending CN113946497A (en) 2021-09-03 2021-09-03 Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources

Country Status (1)

Country Link
CN (1) CN113946497A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114710505A (en) * 2022-04-02 2022-07-05 杭州云象网络技术有限公司 Method and system for realizing ecological safety supervision of digital RMB (national currency) based on block chain
CN114978856A (en) * 2022-05-11 2022-08-30 北京辛诺创新科技有限公司 Multi-cloud computing management platform and method
CN115865622A (en) * 2022-11-25 2023-03-28 南方电网数字平台科技(广东)有限公司 Multi-cloud monitoring and alarming method and device
CN116166505A (en) * 2023-02-22 2023-05-26 优维科技(深圳)有限公司 Monitoring platform, method, storage medium and equipment for dual-state IT architecture in financial industry
WO2023142054A1 (en) * 2022-01-27 2023-08-03 中远海运科技股份有限公司 Container microservice-oriented performance monitoring and alarm method and alarm system
CN117033158A (en) * 2023-10-09 2023-11-10 深圳市金众工程检验检测有限公司 Comprehensive performance monitoring method based on cloud platform

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023142054A1 (en) * 2022-01-27 2023-08-03 中远海运科技股份有限公司 Container microservice-oriented performance monitoring and alarm method and alarm system
CN114710505A (en) * 2022-04-02 2022-07-05 杭州云象网络技术有限公司 Method and system for realizing ecological safety supervision of digital RMB (national currency) based on block chain
CN114978856A (en) * 2022-05-11 2022-08-30 北京辛诺创新科技有限公司 Multi-cloud computing management platform and method
CN115865622A (en) * 2022-11-25 2023-03-28 南方电网数字平台科技(广东)有限公司 Multi-cloud monitoring and alarming method and device
CN116166505A (en) * 2023-02-22 2023-05-26 优维科技(深圳)有限公司 Monitoring platform, method, storage medium and equipment for dual-state IT architecture in financial industry
CN116166505B (en) * 2023-02-22 2023-09-26 优维科技(深圳)有限公司 Monitoring platform, method, storage medium and equipment for dual-state IT architecture in financial industry
CN117033158A (en) * 2023-10-09 2023-11-10 深圳市金众工程检验检测有限公司 Comprehensive performance monitoring method based on cloud platform

Similar Documents

Publication Publication Date Title
CN113946497A (en) Method suitable for unified intelligent monitoring and alarming of multi-cloud platform resources
CN104506393B (en) A kind of system monitoring method based on cloud platform
US11616703B2 (en) Scalable visualization of health data for network devices
CN102447570B (en) Monitoring device and method based on health degree analysis
CN104407964B (en) A kind of centralized monitoring system and method based on data center
WO2019233047A1 (en) Power grid dispatching-based operation and maintenance method
CN105282772A (en) Wireless network data communication equipment monitoring system and equipment monitoring method
CN103295155B (en) Security core service system method for supervising
CN103716173B (en) A kind of method for storing monitoring system and monitoring alarm issue
CN105183609A (en) Real-time monitoring system and method applied to software system
CN102523140A (en) Real-time monitoring device for operation and maintenance of electric power customer service system
CN104637265A (en) Dispatch-automated multilevel integration intelligent watching alarming system
CN111488258A (en) System for analyzing and early warning software and hardware running state
CN111083230A (en) Computer network operation management system
WO2015192664A1 (en) Device monitoring method and apparatus
Safrianti et al. Real-time network device monitoring system with simple network management protocol (SNMP) model
CN103973484A (en) Operation and maintenance management system based on network topological structure
US11558242B2 (en) Generation of synthetic alerts and unified dashboard for viewing multiple layers of data center simultaneously
US20230198860A1 (en) Systems and methods for the temporal monitoring and visualization of network health of direct interconnect networks
CN114885014A (en) Method, device, equipment and medium for monitoring external field equipment state
US11425011B2 (en) System and method for real time monitoring a plurality of network devices
CN108599978A (en) A kind of cloud monitoring method and device
CN111817865A (en) Method for monitoring network management equipment and monitoring system
CN203911977U (en) Cross-network monitoring system for information equipment
CN109614292A (en) Host operation data automatic collection monitoring system based on shell

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination