CN114422339A - Automatic scheduling distributed data monitoring system and method - Google Patents
Automatic scheduling distributed data monitoring system and method Download PDFInfo
- Publication number
- CN114422339A CN114422339A CN202210314581.0A CN202210314581A CN114422339A CN 114422339 A CN114422339 A CN 114422339A CN 202210314581 A CN202210314581 A CN 202210314581A CN 114422339 A CN114422339 A CN 114422339A
- Authority
- CN
- China
- Prior art keywords
- alarm
- alarm rule
- modules
- module
- slave
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0681—Configuration of triggering conditions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/203—Failover techniques using migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3089—Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
- G06F11/3093—Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/02—Standardisation; Integration
- H04L41/022—Multivendor or multi-standard integration
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/02—Standardisation; Integration
- H04L41/0246—Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols
- H04L41/0253—Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols using browsers or web-pages for accessing management information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/046—Network management architectures or arrangements comprising network management agents or mobile agents therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
- H04L43/0805—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
- H04L43/0817—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/80—Database-specific techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Environmental & Geological Engineering (AREA)
- Mathematical Physics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides an automatic dispatching distributed data monitoring system and method, and belongs to the technical field of monitoring systems. The invention comprises the following steps: the system comprises an Nginx reverse proxy server, a plurality of alarm rule modules, a Kafka queue and a plurality of alarm calculation modules. The invention solves the problems of massive alarm rules, performance problems in calculation and unavailability of the system in the process of calculating the downtime of the alarm module. Meanwhile, the invention solves the problem of information loss caused by the fact that the alarm rules are not processed in time in the peak value area when the alarm rules are established in batch. Finally, the invention provides a configuration method of the alarm rule based on the dimensionality of the user, the dimensionality of the time, the dimensionality of the monitored object and the like.
Description
Technical Field
The invention belongs to the technical field of monitoring systems, and particularly relates to an automatic scheduling distributed data monitoring system and method.
Background
In cloud computing and an Internet of things system, all business modules need to work cooperatively, and internal data of the business modules have the characteristics of heterogeneity, loose coupling and the like. These systems all require a stable available monitoring system to ensure their health and stability. The data monitoring system plays an important role in guaranteeing data security and data service quality, and therefore, research on the monitoring system is very meaningful.
The common monitoring system generally adopts a data interface mode, pushes data to the monitoring system through a third-party service, or pulls the data from the third-party service, and then performs centralized rule calculation in the monitoring system, so as to judge whether to alarm. Due to the design of the framework, when the number of the data points of the third-party service to be docked is large and the number of alarm rules for operating the data is large, the monitoring system can easily reach the performance bottleneck in the aspects of data access and data processing, and the expansibility of the system is weak.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an automatic scheduling distributed data monitoring system and method.
In order to achieve the above purpose, the invention provides the following technical scheme:
an automatically scheduled distributed data monitoring system comprising:
the Nginx reverse proxy server is in communication connection with the monitored object through an HTTP (hyper text transport protocol), an MQTT (multiple quantum QTT) protocol and a WebSocket protocol;
the plurality of alarm rule modules are deployed by adopting a Master-Slave framework, and are connected with the Nginx reverse proxy server through an Http protocol;
the Kafka queue is connected with the plurality of alarm rule modules through an Http protocol;
the alarm calculation modules are connected in a Tcp protocol mode through a Zookeeper server; each alarm calculation module is in communication connection with the Kafka queue.
Preferably, the Master-slave architecture includes:
the slave nodes are in communication connection with the Zookeeper server; for computing services;
the Master node is in communication connection with the Zookeeper server; the method is mainly used for managing a plurality of slave nodes and simultaneously bearing computing services.
Preferably, the method further comprises the following steps:
the MYSQL database is in communication connection with the plurality of alarm rule modules;
the time sequence database OpenTsdb is in communication connection with the Kafka queue and the plurality of alarm calculation modules;
and the centralized cache module is in communication connection with the Kafka queue and the plurality of alarm calculation modules.
An automatically scheduled distributed data monitoring method comprises the following steps:
the Nginx reverse proxy server acquires a monitoring request of a monitored object and stores the monitoring request in a time sequence database OpenTsdb;
when the alarm rule module is down, the alarm rule modules are redeployed; otherwise, the plurality of alarm rule modules respectively formulate alarm rules and multidimensional data logic models according to the monitoring request of the monitored object, and store the plurality of alarm rules and the data logic models in the MYSQL database;
the Kafka queue decouples the plurality of alarm rule modules and the plurality of alarm calculation modules;
a plurality of alarm rules and data logic models enter a Kafka queue;
the Zookeeper server coordinates a plurality of alarm calculation modules to respectively acquire an alarm rule and a data logic model which are respectively maintained from the KafKa queue, and the plurality of alarm calculation modules respectively calculate according to the alarm rule and the data logic model to parallelly acquire a plurality of monitoring alarm results;
and the monitoring alarm result is pushed to the monitored object or a third-party system through the kafka queue.
Preferably, when the alarm rule module goes down, the step of relocating the plurality of alarm rule modules includes:
when a detection cluster on an alarm rule module of a Master node detects that a slave is down;
an alarm rule module of the Master node obtains the IP address of a slave alarm rule module in downtime from a Zookeeper server;
the alarm rule module of the Master node acquires an alarm rule corresponding to the IP address of the slave alarm rule module in downtime from the cache data of the centralized cache module;
an alarm rule module of the Master node sends an alarm rule corresponding to the IP address of the down machine to a Kafka queue;
and an alarm rule module of the Master node and a survivor slave alarm rule module form a new Master-slave architecture.
Preferably, when the alarm rule module goes down, the step of relocating the plurality of alarm rule modules includes:
when a detection cluster on an alarm rule module of any slave node detects that an alarm rule module of a Master node is down;
the Zookeeper server selects a slave alarm rule module as a new Master alarm rule module; the new Master alarm rule module obtains the IP address of the shutdown Master alarm rule module from the Zookeeper server;
the new Master alarm rule module acquires the alarm rule corresponding to the IP address of the shutdown Master alarm rule module from the cache;
the new Master alarm rule module sends alarm rules corresponding to the IP address of the shutdown Master to a Kafka queue;
and the alarm rule module of the new Master node and the plurality of slave alarm rule modules form a new Master-slave architecture.
Preferably, the step of obtaining a plurality of monitoring alarm results in parallel by the plurality of alarm calculation modules according to the alarm rules and the data logic model operation comprises:
the alarm calculation module converts the data logic model into a monitoring request and obtains the data of the monitored object from the monitoring request;
and the alarm calculation module calculates the data of the monitored object by using the alarm rule to obtain a monitoring alarm result.
Preferably, the step of entering the Kafka queue by the plurality of alarm rules and the data logic model comprises:
a plurality of alarm rules and a data logic model are entered into a Kafka queue from a plurality of alarm rule modules.
Preferably, the step of entering the KafKa queue by the plurality of alarm rules and the data logic model includes:
a plurality of alarm rules and data logic models are loaded from the MYSQL database into the Kafka queue.
Preferably, the data logic model includes:
a plurality of namespaces, wherein the number of the namespaces is determined by a monitoring request of a monitored object;
a plurality of meters respectively subordinate to the plurality of namespaces; the number of meters is determined by a monitoring request of a monitored object;
a plurality of dimensions respectively subordinate to the plurality of metrics; the number of sessions is determined by the monitoring request of the monitored object. The automatic dispatching distributed data monitoring system and the method thereof provided by the invention have the following beneficial effects: (1) high availability, particularly for availability in large data volume environments, the distributed system design enables a greater number of alarms to be handled by the system per unit time. Therefore, the system can be applied in a big data service state. (2) The reliability is high, in the structural design of the distributed cluster, any alarm computing node is down, and the high-level rule set for maintaining the computation is dynamically transferred to other computing nodes. Therefore, the data in the system memory are not lost due to the instability of hardware under the responsible environment. (3) The invention can establish multi-dimensional alarm rule configuration for different monitored objects from different time granularities and different alarm calculation methods, thereby meeting the alarm monitoring for different requirements of the monitored objects under complex scenes.
Drawings
In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some embodiments of the invention and it will be clear to a person skilled in the art that other drawings can be derived from them without inventive effort.
Fig. 1 is a structure of an automatically scheduled distributed data monitoring system according to embodiment 1 of the present invention;
fig. 2 is a flowchart of an automatically scheduled distributed data monitoring method according to embodiment 1 of the present invention;
FIG. 3 is a production consumption graph of alarm rules in embodiment 1 of the present invention;
fig. 4 is a flowchart of slave downtime alarm rules according to embodiment 1 of the present invention;
fig. 5 is a flow chart of Master downtime alarm rules according to embodiment 1 of the present invention;
FIG. 6 is an interaction diagram of an alarm calculation module according to embodiment 1 of the present invention;
fig. 7 is a tree structure diagram of a data logic model in embodiment 1 of the present invention.
Detailed Description
In order that those skilled in the art will better understand the technical solutions of the present invention and can practice the same, the present invention will be described in detail with reference to the accompanying drawings and specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "axial", "radial", "circumferential", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing technical solutions of the present invention and simplifying the description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it should be noted that, unless explicitly stated or limited otherwise, the terms "connected" and "connected" are to be interpreted broadly, e.g., as a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations. In the description of the present invention, unless otherwise specified, "a plurality" means two or more, and will not be described in detail herein.
Example 1
Referring to fig. 1, an automatically scheduled distributed data monitoring system includes: the system comprises an Nginx reverse proxy server, a plurality of alarm rule modules, a Kafka queue, a plurality of alarm calculation modules, a MYSQL database, a time sequence database OpenTsdb and a centralized cache module. The Nginx reverse proxy server is in communication connection with the monitored object through an HTTP protocol, an MQTT protocol and a WebSocket protocol. And the plurality of alarm rule modules are deployed in a distributed load balancing mode, and are connected with the Nginx reverse proxy server through an Http protocol. The Kafka queue is connected with a plurality of alarm rule modules through an Http protocol. And the plurality of alarm calculation modules are connected in a Tcp protocol mode through the Zookeeper server. The MYSQL database is in communication connection with the plurality of alarm rule modules. And the time sequence database OpenTsdb is in communication connection with the Kafka queue and the plurality of alarm calculation modules. The centralized cache module is in communication connection with the Kafka queue and the plurality of alarm calculation modules.
In this embodiment, the distributed load balancing manner is a Master-slave architecture, and the Master-slave architecture includes: a plurality of slave nodes and a Master node. The slave nodes are in communication connection with the Zookeeper server and used for computing services. The Master node is in communication connection with the Zookeeper server, is mainly used for managing a plurality of slave nodes and simultaneously bears computing services.
Referring to fig. 2, an automatically scheduled distributed data monitoring method includes the following steps: the Nginx reverse proxy server acquires a monitoring request of a monitored object and stores the monitoring request in a time sequence database OpenTsdb; when the alarm rule module is down, the alarm rule modules are redeployed; otherwise, the plurality of alarm rule modules respectively formulate alarm rules and multidimensional data logic models according to the monitoring request of the monitored object, and store the plurality of alarm rules and the data logic models in the MYSQL database; the Kafka queue decouples the plurality of alarm rule modules and the plurality of alarm calculation modules; a plurality of alarm rules and data logic models enter a Kafka queue; the Zookeeper server coordinates a plurality of alarm calculation modules to respectively acquire an alarm rule and a data logic model which are respectively maintained from the KafKa queue, and the plurality of alarm calculation modules respectively calculate according to the alarm rule and the data logic model to parallelly acquire a plurality of monitoring alarm results; and the monitoring alarm result is pushed to the monitored object or a third-party system through the kafka queue.
In this embodiment, the alarm rule module is mainly responsible for setting alarm rules and maintaining data management of the monitored object, and a series of alarm rule models are established for the monitored object concerned by the user according to actual requirements through an HTTP request of a local area network or a wide area network. In this embodiment, an alarm threshold may be configured, and support threshold calculation rules such as "greater than", "greater than or equal to", "less than or equal to", and a data aggregation mode supports Sum, Average, Max, Min, and Variance. The alarm rules may configure the calculation period of the monitored data and configure the number of times the calculation result exceeds a threshold.
After the alarm calculation modules adopt a distributed architecture, a single node of each alarm calculation module only maintains a subset of the alarm rules, and if a certain node goes down, the maintained subset of the alarm rules are all lost theoretically, so that the system is unreliable. As shown in fig. 3.
According to the method, a Master-slave mode is adopted by utilizing the service coordination capability of the Zookeeper, the Master node manages all nodes, when the slave is detected to be down, information such as an ip address of the down machine can be acquired, through the information, the Master node is matched with all alarm rules corresponding to the ip address in a cache or a database, and then the alarm rule set on the down node is sent to the Kafka queue again. And other surviving alarm rule computing nodes consume the message queue again, and randomly and disorderly consume the subset of the alarm rule sets. Furthermore, the rest of other alarm rule calculation nodes can still maintain the complete set of alarm rules after the message queue is consumed, so that the distributed system has high reliability.
Referring to fig. 4, the step of redeploying the plurality of alarm rule modules includes: when a detection cluster on an alarm rule module of a Master node detects that a slave is down; an alarm rule module of the Master node obtains the IP address of a slave alarm rule module in downtime from a Zookeeper server; the alarm rule module of the Master node acquires an alarm rule corresponding to the IP address of the slave alarm rule module in downtime from the cache data of the centralized cache module; an alarm rule module of the Master node sends an alarm rule corresponding to the IP address of the down machine to a Kafka queue; and an alarm rule module of the Master node and a survivor slave alarm rule module form a new Master-slave architecture.
Referring to fig. 5, when an alarm rule module goes down, the step of relocating the plurality of alarm rule modules includes: when a detection cluster on an alarm rule module of any slave node detects that an alarm rule module of a Master node is down; the Zookeeper server selects a slave alarm rule module as a new Master alarm rule module; the new Master alarm rule module obtains the IP address of the shutdown Master alarm rule module from the Zookeeper server; the new Master alarm rule module acquires the alarm rule corresponding to the IP address of the shutdown Master alarm rule module from the cache; the new Master alarm rule module sends the alarm rule corresponding to the IP address of the shutdown Master to the Kafka queue; and the alarm rule module of the new Master node and the plurality of slave alarm rule modules form a new Master-slave architecture.
Due to the fact that the open source time sequence database Opentsdb has a stored logic structure with the following parts: after the data are modeled by the Metric, Tags, Value and Timestamp, wherein namespace and dimension in the monitoring data logic model need to be converted into Tags in the opentsdb storage logic at the same time, and the Metric in the monitoring data logic model is converted into the Metric in the opentsdb. And storing the acquired data values in a Tsdb database according to tags, metrics and time series.
As shown in fig. 6, the step of the alarm calculation module obtaining the monitoring alarm result according to the alarm rule and the data logic model operation includes: the alarm calculation module converts the data logic model into a monitoring request and obtains the data of the monitored object from the monitoring request; and the alarm calculation module calculates the data of the monitored object by using the alarm rule to obtain a monitoring alarm result.
In this embodiment, the step of entering the Kafka queue by the plurality of alarm rules and the data logic model includes two steps: a plurality of alarm rules and a data logic model are entered into a Kafka queue from a plurality of alarm rule modules. A plurality of alarm rules and data logic models are loaded from the MYSQL database into the Kafka queue.
The alarm calculation module is used as a consumer of the Kafka queue, the alarm rule is totally consumed from the Kafka queue and stored in the memory of the process, and then the timer task carries out timing calculation on the alarm data according to the alarm calculation period in the alarm rule. The Kafka queue maintains a complete set of alarm rules of the whole system, and each alarm calculation module process consumes a subset of the alarm rules, so that the load of each alarm calculation module is greatly reduced, and the healthy and stable operation of the system can be ensured.
In this embodiment, the data logic model includes: multiple namespaces, multiple metrics, and multiple dimensions. Wherein, the plurality of metrics are respectively subordinate to the plurality of namespaces, and the plurality of dimension are respectively subordinate to the plurality of metrics. The number of namespace, metric and dimension is determined by the monitoring request of the monitored object. Referring to FIG. 7, for example, the monitored object is a set of Internet of things systems, where there are two subsystems, Subsystem-1 and Subsystem-2, and Subsystem-1 and Subsystem-2 are two different namespaces. If the monitoring system needs to monitor the temperature and the equipment rotating speed in the system of the Internet of things, the meters are temperature and speed, the equipment 1 in the monitoring area 1 is arranged in the Subsystem-1, at the moment, the meters in the Subsystem-1 correspond to two-dimensional dimensions which can be named as area-1 and device-1 respectively, the Subsystem-2 only has one area, and the system monitors the data on the equipment 2, so that only one dimension of the dimension is device-2.
The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, and any simple changes or equivalent substitutions of the technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (10)
1. An automatically scheduled distributed data monitoring system, comprising:
the Nginx reverse proxy server is in communication connection with the monitored object through an HTTP (hyper text transport protocol), an MQTT (multiple quantum QTT) protocol and a WebSocket protocol;
the plurality of alarm rule modules are deployed by adopting a Master-Slave framework, and are connected with the Nginx reverse proxy server through an Http protocol;
the Kafka queue is connected with the plurality of alarm rule modules through an Http protocol;
the alarm calculation modules are connected in a Tcp protocol mode through a Zookeeper server; each alarm calculation module is in communication connection with the Kafka queue.
2. The automatically scheduled distributed data monitoring system of claim 1, wherein the Master-slave architecture comprises:
the slave nodes are in communication connection with the Zookeeper server; for computing services;
the Master node is in communication connection with the Zookeeper server; the method is mainly used for managing a plurality of slave nodes and simultaneously bearing computing services.
3. The automatically scheduled distributed data monitoring system of claim 1, further comprising:
the MYSQL database is in communication connection with the plurality of alarm rule modules;
the time sequence database OpenTsdb is in communication connection with the Kafka queue and the plurality of alarm calculation modules;
and the centralized cache module is in communication connection with the Kafka queue and the plurality of alarm calculation modules.
4. An automatically scheduled distributed data monitoring method is characterized by comprising the following steps:
the Nginx reverse proxy server acquires a monitoring request of a monitored object and stores the monitoring request in a time sequence database OpenTsdb;
when the alarm rule module is down, the alarm rule modules are redeployed; otherwise, the plurality of alarm rule modules respectively formulate alarm rules and multidimensional data logic models according to the monitoring request of the monitored object, and store the plurality of alarm rules and the data logic models in the MYSQL database;
the Kafka queue decouples the plurality of alarm rule modules and the plurality of alarm calculation modules;
a plurality of alarm rules and data logic models enter a Kafka queue;
the Zookeeper server coordinates a plurality of alarm calculation modules to respectively acquire an alarm rule and a data logic model which are respectively maintained from the KafKa queue, and the plurality of alarm calculation modules respectively calculate according to the alarm rule and the data logic model to parallelly acquire a plurality of monitoring alarm results;
and the monitoring alarm result is pushed to the monitored object or a third-party system through the kafka queue.
5. The method according to claim 4, wherein the step of relocating the plurality of alarm rule modules when the alarm rule modules are down comprises:
when a detection cluster on an alarm rule module of a Master node detects that a slave is down;
an alarm rule module of the Master node obtains the IP address of a slave alarm rule module in downtime from a Zookeeper server;
the alarm rule module of the Master node acquires an alarm rule corresponding to the IP address of the slave alarm rule module in downtime from the cache data of the centralized cache module;
an alarm rule module of the Master node sends an alarm rule corresponding to the IP address of the down machine to a Kafka queue;
and an alarm rule module of the Master node and a survivor slave alarm rule module form a new Master-slave architecture.
6. The method according to claim 4, wherein the step of relocating the plurality of alarm rule modules when the alarm rule modules are down comprises:
when a detection cluster on an alarm rule module of any slave node detects that an alarm rule module of a Master node is down;
the Zookeeper server selects a slave alarm rule module as a new Master alarm rule module; the new Master alarm rule module obtains the IP address of the shutdown Master alarm rule module from the Zookeeper server;
the new Master alarm rule module acquires the alarm rule corresponding to the IP address of the shutdown Master alarm rule module from the cache;
the new Master alarm rule module sends alarm rules corresponding to the IP address of the shutdown Master to a Kafka queue;
and the alarm rule module of the new Master node and the plurality of slave alarm rule modules form a new Master-slave architecture.
7. The distributed data monitoring method of claim 4, wherein the step of obtaining a plurality of monitoring alarm results in parallel by a plurality of alarm calculation modules according to the alarm rules and the data logic model operation comprises:
the alarm calculation module converts the data logic model into a monitoring request and obtains the data of the monitored object from the monitoring request;
and the alarm calculation module calculates the data of the monitored object by using the alarm rule to obtain a monitoring alarm result.
8. The automatically scheduled distributed data monitoring method of claim 4, wherein the step of entering the plurality of alarm rules and data logic models into the Kafka queue comprises:
a plurality of alarm rules and a data logic model are entered into a Kafka queue from a plurality of alarm rule modules.
9. The automatically scheduled distributed data monitoring method of claim 4, wherein the step of entering the plurality of alarm rules and data logic models into the KafKa queue comprises:
a plurality of alarm rules and data logic models are loaded from the MYSQL database into the Kafka queue.
10. The automatically scheduled distributed data monitoring method of claim 4, wherein the data logic model comprises:
a plurality of namespaces, wherein the number of the namespaces is determined by a monitoring request of a monitored object;
a plurality of meters respectively subordinate to the plurality of namespaces; the number of meters is determined by a monitoring request of a monitored object;
a plurality of dimensions respectively subordinate to the plurality of metrics; the number of sessions is determined by the monitoring request of the monitored object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210314581.0A CN114422339B (en) | 2022-03-29 | 2022-03-29 | Automatic scheduling distributed data monitoring system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210314581.0A CN114422339B (en) | 2022-03-29 | 2022-03-29 | Automatic scheduling distributed data monitoring system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114422339A true CN114422339A (en) | 2022-04-29 |
CN114422339B CN114422339B (en) | 2022-07-01 |
Family
ID=81262784
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210314581.0A Active CN114422339B (en) | 2022-03-29 | 2022-03-29 | Automatic scheduling distributed data monitoring system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114422339B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018103315A1 (en) * | 2016-12-09 | 2018-06-14 | 上海壹账通金融科技有限公司 | Monitoring data processing method, apparatus, server and storage equipment |
CN108234199A (en) * | 2017-12-20 | 2018-06-29 | 中国联合网络通信集团有限公司 | Monitoring method, apparatus and system based on Kafka |
CN108270618A (en) * | 2017-12-30 | 2018-07-10 | 杭州华为数字技术有限公司 | Alert the method, apparatus and warning system of judgement |
CN109709389A (en) * | 2018-11-30 | 2019-05-03 | 珠海派诺科技股份有限公司 | For electric instrument distributed mass real time data sampling alarm method and system |
US20200160230A1 (en) * | 2018-11-19 | 2020-05-21 | International Business Machines Corporation | Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs |
CN111190798A (en) * | 2020-01-03 | 2020-05-22 | 苏宁云计算有限公司 | Service data monitoring and warning device and method |
CN111190794A (en) * | 2019-12-30 | 2020-05-22 | 天津浪淘科技股份有限公司 | Operation and maintenance monitoring and management system |
WO2021184586A1 (en) * | 2020-03-18 | 2021-09-23 | 平安科技(深圳)有限公司 | Private cloud monitoring method and apparatus based on non-flat network, and computer device and storage medium |
CN113448812A (en) * | 2021-07-15 | 2021-09-28 | 中国银行股份有限公司 | Monitoring alarm method and device under micro-service scene |
-
2022
- 2022-03-29 CN CN202210314581.0A patent/CN114422339B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018103315A1 (en) * | 2016-12-09 | 2018-06-14 | 上海壹账通金融科技有限公司 | Monitoring data processing method, apparatus, server and storage equipment |
CN108234199A (en) * | 2017-12-20 | 2018-06-29 | 中国联合网络通信集团有限公司 | Monitoring method, apparatus and system based on Kafka |
CN108270618A (en) * | 2017-12-30 | 2018-07-10 | 杭州华为数字技术有限公司 | Alert the method, apparatus and warning system of judgement |
US20200160230A1 (en) * | 2018-11-19 | 2020-05-21 | International Business Machines Corporation | Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs |
CN109709389A (en) * | 2018-11-30 | 2019-05-03 | 珠海派诺科技股份有限公司 | For electric instrument distributed mass real time data sampling alarm method and system |
CN111190794A (en) * | 2019-12-30 | 2020-05-22 | 天津浪淘科技股份有限公司 | Operation and maintenance monitoring and management system |
CN111190798A (en) * | 2020-01-03 | 2020-05-22 | 苏宁云计算有限公司 | Service data monitoring and warning device and method |
WO2021184586A1 (en) * | 2020-03-18 | 2021-09-23 | 平安科技(深圳)有限公司 | Private cloud monitoring method and apparatus based on non-flat network, and computer device and storage medium |
CN113448812A (en) * | 2021-07-15 | 2021-09-28 | 中国银行股份有限公司 | Monitoring alarm method and device under micro-service scene |
Non-Patent Citations (1)
Title |
---|
郝鹏海等: "基于Kafka和Kubernetes的云平台监控告警***", 《计算机***应用》 * |
Also Published As
Publication number | Publication date |
---|---|
CN114422339B (en) | 2022-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112231075B (en) | Cloud service-based server cluster load balancing control method and system | |
WO2020147336A1 (en) | Micro-service full-link monitoring system and method | |
CN112015753B (en) | Monitoring system and method suitable for containerized deployment of open source cloud platform | |
CN112202617B (en) | Resource management system monitoring method, device, computer equipment and storage medium | |
US20210182307A1 (en) | System and methods for autonomous monitoring and recovery in hybrid energy management | |
Sathyamoorthy et al. | Energy efficiency as an orchestration service for mobile Internet of Things | |
CN114513542A (en) | Production equipment control method and device, computer equipment and storage medium | |
CN114090378A (en) | Custom monitoring and alarming method based on Kapacitor | |
CN105471938B (en) | Server load management method and device | |
CN105467907A (en) | Automatic inspection system and method | |
CN114422339B (en) | Automatic scheduling distributed data monitoring system and method | |
CN112817992B (en) | Method, apparatus, electronic device and readable storage medium for executing change task | |
CN113342625A (en) | Data monitoring method and system | |
CN112486776A (en) | Cluster member node availability monitoring equipment and method | |
CN115665173B (en) | MQ-based Websocket communication method, system and storage medium | |
CN116383207A (en) | Data tag management method and device, electronic equipment and storage medium | |
WO2023273461A1 (en) | Robot operating state monitoring system, and method | |
Rovnyagin et al. | Cloud computing architecture for high-volume monitoring processing | |
WO2020037634A1 (en) | Information monitoring system and method for industrial control device network, computer readable storage medium, and computer device | |
CN110505301A (en) | A kind of aeronautical manufacture workshop industry big data processing frame | |
CN114861909A (en) | Model quality monitoring method and device, electronic equipment and storage medium | |
CN107844401A (en) | Data monitoring method, device and computer-readable storage medium | |
WO2018083710A2 (en) | An improved management and internetworking of devices to collect and exchange data without requiring interaction | |
Rohm et al. | Enabling resource-awareness for in-network data processing in wireless sensor networks | |
CN113656239A (en) | Monitoring method and device for middleware and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A distributed data monitoring system and method for automatic scheduling Effective date of registration: 20230109 Granted publication date: 20220701 Pledgee: Xi'an innovation financing Company limited by guarantee Pledgor: Xi'an Tali Technology Co.,Ltd. Registration number: Y2023610000024 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |