CN114422339A - Automatic scheduling distributed data monitoring system and method - Google Patents

Automatic scheduling distributed data monitoring system and method Download PDF

Info

Publication number
CN114422339A
CN114422339A CN202210314581.0A CN202210314581A CN114422339A CN 114422339 A CN114422339 A CN 114422339A CN 202210314581 A CN202210314581 A CN 202210314581A CN 114422339 A CN114422339 A CN 114422339A
Authority
CN
China
Prior art keywords
alarm
alarm rule
modules
module
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210314581.0A
Other languages
Chinese (zh)
Other versions
CN114422339B (en
Inventor
郭飞
胡玮
管永权
董晓军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Tali Technology Co ltd
Original Assignee
Xi'an Tali Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Tali Technology Co ltd filed Critical Xi'an Tali Technology Co ltd
Priority to CN202210314581.0A priority Critical patent/CN114422339B/en
Publication of CN114422339A publication Critical patent/CN114422339A/en
Application granted granted Critical
Publication of CN114422339B publication Critical patent/CN114422339B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0681Configuration of triggering conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/022Multivendor or multi-standard integration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0246Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols
    • H04L41/0253Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols using browsers or web-pages for accessing management information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/046Network management architectures or arrangements comprising network management agents or mobile agents therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides an automatic dispatching distributed data monitoring system and method, and belongs to the technical field of monitoring systems. The invention comprises the following steps: the system comprises an Nginx reverse proxy server, a plurality of alarm rule modules, a Kafka queue and a plurality of alarm calculation modules. The invention solves the problems of massive alarm rules, performance problems in calculation and unavailability of the system in the process of calculating the downtime of the alarm module. Meanwhile, the invention solves the problem of information loss caused by the fact that the alarm rules are not processed in time in the peak value area when the alarm rules are established in batch. Finally, the invention provides a configuration method of the alarm rule based on the dimensionality of the user, the dimensionality of the time, the dimensionality of the monitored object and the like.

Description

Automatic scheduling distributed data monitoring system and method
Technical Field
The invention belongs to the technical field of monitoring systems, and particularly relates to an automatic scheduling distributed data monitoring system and method.
Background
In cloud computing and an Internet of things system, all business modules need to work cooperatively, and internal data of the business modules have the characteristics of heterogeneity, loose coupling and the like. These systems all require a stable available monitoring system to ensure their health and stability. The data monitoring system plays an important role in guaranteeing data security and data service quality, and therefore, research on the monitoring system is very meaningful.
The common monitoring system generally adopts a data interface mode, pushes data to the monitoring system through a third-party service, or pulls the data from the third-party service, and then performs centralized rule calculation in the monitoring system, so as to judge whether to alarm. Due to the design of the framework, when the number of the data points of the third-party service to be docked is large and the number of alarm rules for operating the data is large, the monitoring system can easily reach the performance bottleneck in the aspects of data access and data processing, and the expansibility of the system is weak.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an automatic scheduling distributed data monitoring system and method.
In order to achieve the above purpose, the invention provides the following technical scheme:
an automatically scheduled distributed data monitoring system comprising:
the Nginx reverse proxy server is in communication connection with the monitored object through an HTTP (hyper text transport protocol), an MQTT (multiple quantum QTT) protocol and a WebSocket protocol;
the plurality of alarm rule modules are deployed by adopting a Master-Slave framework, and are connected with the Nginx reverse proxy server through an Http protocol;
the Kafka queue is connected with the plurality of alarm rule modules through an Http protocol;
the alarm calculation modules are connected in a Tcp protocol mode through a Zookeeper server; each alarm calculation module is in communication connection with the Kafka queue.
Preferably, the Master-slave architecture includes:
the slave nodes are in communication connection with the Zookeeper server; for computing services;
the Master node is in communication connection with the Zookeeper server; the method is mainly used for managing a plurality of slave nodes and simultaneously bearing computing services.
Preferably, the method further comprises the following steps:
the MYSQL database is in communication connection with the plurality of alarm rule modules;
the time sequence database OpenTsdb is in communication connection with the Kafka queue and the plurality of alarm calculation modules;
and the centralized cache module is in communication connection with the Kafka queue and the plurality of alarm calculation modules.
An automatically scheduled distributed data monitoring method comprises the following steps:
the Nginx reverse proxy server acquires a monitoring request of a monitored object and stores the monitoring request in a time sequence database OpenTsdb;
when the alarm rule module is down, the alarm rule modules are redeployed; otherwise, the plurality of alarm rule modules respectively formulate alarm rules and multidimensional data logic models according to the monitoring request of the monitored object, and store the plurality of alarm rules and the data logic models in the MYSQL database;
the Kafka queue decouples the plurality of alarm rule modules and the plurality of alarm calculation modules;
a plurality of alarm rules and data logic models enter a Kafka queue;
the Zookeeper server coordinates a plurality of alarm calculation modules to respectively acquire an alarm rule and a data logic model which are respectively maintained from the KafKa queue, and the plurality of alarm calculation modules respectively calculate according to the alarm rule and the data logic model to parallelly acquire a plurality of monitoring alarm results;
and the monitoring alarm result is pushed to the monitored object or a third-party system through the kafka queue.
Preferably, when the alarm rule module goes down, the step of relocating the plurality of alarm rule modules includes:
when a detection cluster on an alarm rule module of a Master node detects that a slave is down;
an alarm rule module of the Master node obtains the IP address of a slave alarm rule module in downtime from a Zookeeper server;
the alarm rule module of the Master node acquires an alarm rule corresponding to the IP address of the slave alarm rule module in downtime from the cache data of the centralized cache module;
an alarm rule module of the Master node sends an alarm rule corresponding to the IP address of the down machine to a Kafka queue;
and an alarm rule module of the Master node and a survivor slave alarm rule module form a new Master-slave architecture.
Preferably, when the alarm rule module goes down, the step of relocating the plurality of alarm rule modules includes:
when a detection cluster on an alarm rule module of any slave node detects that an alarm rule module of a Master node is down;
the Zookeeper server selects a slave alarm rule module as a new Master alarm rule module; the new Master alarm rule module obtains the IP address of the shutdown Master alarm rule module from the Zookeeper server;
the new Master alarm rule module acquires the alarm rule corresponding to the IP address of the shutdown Master alarm rule module from the cache;
the new Master alarm rule module sends alarm rules corresponding to the IP address of the shutdown Master to a Kafka queue;
and the alarm rule module of the new Master node and the plurality of slave alarm rule modules form a new Master-slave architecture.
Preferably, the step of obtaining a plurality of monitoring alarm results in parallel by the plurality of alarm calculation modules according to the alarm rules and the data logic model operation comprises:
the alarm calculation module converts the data logic model into a monitoring request and obtains the data of the monitored object from the monitoring request;
and the alarm calculation module calculates the data of the monitored object by using the alarm rule to obtain a monitoring alarm result.
Preferably, the step of entering the Kafka queue by the plurality of alarm rules and the data logic model comprises:
a plurality of alarm rules and a data logic model are entered into a Kafka queue from a plurality of alarm rule modules.
Preferably, the step of entering the KafKa queue by the plurality of alarm rules and the data logic model includes:
a plurality of alarm rules and data logic models are loaded from the MYSQL database into the Kafka queue.
Preferably, the data logic model includes:
a plurality of namespaces, wherein the number of the namespaces is determined by a monitoring request of a monitored object;
a plurality of meters respectively subordinate to the plurality of namespaces; the number of meters is determined by a monitoring request of a monitored object;
a plurality of dimensions respectively subordinate to the plurality of metrics; the number of sessions is determined by the monitoring request of the monitored object. The automatic dispatching distributed data monitoring system and the method thereof provided by the invention have the following beneficial effects: (1) high availability, particularly for availability in large data volume environments, the distributed system design enables a greater number of alarms to be handled by the system per unit time. Therefore, the system can be applied in a big data service state. (2) The reliability is high, in the structural design of the distributed cluster, any alarm computing node is down, and the high-level rule set for maintaining the computation is dynamically transferred to other computing nodes. Therefore, the data in the system memory are not lost due to the instability of hardware under the responsible environment. (3) The invention can establish multi-dimensional alarm rule configuration for different monitored objects from different time granularities and different alarm calculation methods, thereby meeting the alarm monitoring for different requirements of the monitored objects under complex scenes.
Drawings
In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some embodiments of the invention and it will be clear to a person skilled in the art that other drawings can be derived from them without inventive effort.
Fig. 1 is a structure of an automatically scheduled distributed data monitoring system according to embodiment 1 of the present invention;
fig. 2 is a flowchart of an automatically scheduled distributed data monitoring method according to embodiment 1 of the present invention;
FIG. 3 is a production consumption graph of alarm rules in embodiment 1 of the present invention;
fig. 4 is a flowchart of slave downtime alarm rules according to embodiment 1 of the present invention;
fig. 5 is a flow chart of Master downtime alarm rules according to embodiment 1 of the present invention;
FIG. 6 is an interaction diagram of an alarm calculation module according to embodiment 1 of the present invention;
fig. 7 is a tree structure diagram of a data logic model in embodiment 1 of the present invention.
Detailed Description
In order that those skilled in the art will better understand the technical solutions of the present invention and can practice the same, the present invention will be described in detail with reference to the accompanying drawings and specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "axial", "radial", "circumferential", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing technical solutions of the present invention and simplifying the description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it should be noted that, unless explicitly stated or limited otherwise, the terms "connected" and "connected" are to be interpreted broadly, e.g., as a fixed connection, a detachable connection, or an integral connection; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations. In the description of the present invention, unless otherwise specified, "a plurality" means two or more, and will not be described in detail herein.
Example 1
Referring to fig. 1, an automatically scheduled distributed data monitoring system includes: the system comprises an Nginx reverse proxy server, a plurality of alarm rule modules, a Kafka queue, a plurality of alarm calculation modules, a MYSQL database, a time sequence database OpenTsdb and a centralized cache module. The Nginx reverse proxy server is in communication connection with the monitored object through an HTTP protocol, an MQTT protocol and a WebSocket protocol. And the plurality of alarm rule modules are deployed in a distributed load balancing mode, and are connected with the Nginx reverse proxy server through an Http protocol. The Kafka queue is connected with a plurality of alarm rule modules through an Http protocol. And the plurality of alarm calculation modules are connected in a Tcp protocol mode through the Zookeeper server. The MYSQL database is in communication connection with the plurality of alarm rule modules. And the time sequence database OpenTsdb is in communication connection with the Kafka queue and the plurality of alarm calculation modules. The centralized cache module is in communication connection with the Kafka queue and the plurality of alarm calculation modules.
In this embodiment, the distributed load balancing manner is a Master-slave architecture, and the Master-slave architecture includes: a plurality of slave nodes and a Master node. The slave nodes are in communication connection with the Zookeeper server and used for computing services. The Master node is in communication connection with the Zookeeper server, is mainly used for managing a plurality of slave nodes and simultaneously bears computing services.
Referring to fig. 2, an automatically scheduled distributed data monitoring method includes the following steps: the Nginx reverse proxy server acquires a monitoring request of a monitored object and stores the monitoring request in a time sequence database OpenTsdb; when the alarm rule module is down, the alarm rule modules are redeployed; otherwise, the plurality of alarm rule modules respectively formulate alarm rules and multidimensional data logic models according to the monitoring request of the monitored object, and store the plurality of alarm rules and the data logic models in the MYSQL database; the Kafka queue decouples the plurality of alarm rule modules and the plurality of alarm calculation modules; a plurality of alarm rules and data logic models enter a Kafka queue; the Zookeeper server coordinates a plurality of alarm calculation modules to respectively acquire an alarm rule and a data logic model which are respectively maintained from the KafKa queue, and the plurality of alarm calculation modules respectively calculate according to the alarm rule and the data logic model to parallelly acquire a plurality of monitoring alarm results; and the monitoring alarm result is pushed to the monitored object or a third-party system through the kafka queue.
In this embodiment, the alarm rule module is mainly responsible for setting alarm rules and maintaining data management of the monitored object, and a series of alarm rule models are established for the monitored object concerned by the user according to actual requirements through an HTTP request of a local area network or a wide area network. In this embodiment, an alarm threshold may be configured, and support threshold calculation rules such as "greater than", "greater than or equal to", "less than or equal to", and a data aggregation mode supports Sum, Average, Max, Min, and Variance. The alarm rules may configure the calculation period of the monitored data and configure the number of times the calculation result exceeds a threshold.
After the alarm calculation modules adopt a distributed architecture, a single node of each alarm calculation module only maintains a subset of the alarm rules, and if a certain node goes down, the maintained subset of the alarm rules are all lost theoretically, so that the system is unreliable. As shown in fig. 3.
According to the method, a Master-slave mode is adopted by utilizing the service coordination capability of the Zookeeper, the Master node manages all nodes, when the slave is detected to be down, information such as an ip address of the down machine can be acquired, through the information, the Master node is matched with all alarm rules corresponding to the ip address in a cache or a database, and then the alarm rule set on the down node is sent to the Kafka queue again. And other surviving alarm rule computing nodes consume the message queue again, and randomly and disorderly consume the subset of the alarm rule sets. Furthermore, the rest of other alarm rule calculation nodes can still maintain the complete set of alarm rules after the message queue is consumed, so that the distributed system has high reliability.
Referring to fig. 4, the step of redeploying the plurality of alarm rule modules includes: when a detection cluster on an alarm rule module of a Master node detects that a slave is down; an alarm rule module of the Master node obtains the IP address of a slave alarm rule module in downtime from a Zookeeper server; the alarm rule module of the Master node acquires an alarm rule corresponding to the IP address of the slave alarm rule module in downtime from the cache data of the centralized cache module; an alarm rule module of the Master node sends an alarm rule corresponding to the IP address of the down machine to a Kafka queue; and an alarm rule module of the Master node and a survivor slave alarm rule module form a new Master-slave architecture.
Referring to fig. 5, when an alarm rule module goes down, the step of relocating the plurality of alarm rule modules includes: when a detection cluster on an alarm rule module of any slave node detects that an alarm rule module of a Master node is down; the Zookeeper server selects a slave alarm rule module as a new Master alarm rule module; the new Master alarm rule module obtains the IP address of the shutdown Master alarm rule module from the Zookeeper server; the new Master alarm rule module acquires the alarm rule corresponding to the IP address of the shutdown Master alarm rule module from the cache; the new Master alarm rule module sends the alarm rule corresponding to the IP address of the shutdown Master to the Kafka queue; and the alarm rule module of the new Master node and the plurality of slave alarm rule modules form a new Master-slave architecture.
Due to the fact that the open source time sequence database Opentsdb has a stored logic structure with the following parts: after the data are modeled by the Metric, Tags, Value and Timestamp, wherein namespace and dimension in the monitoring data logic model need to be converted into Tags in the opentsdb storage logic at the same time, and the Metric in the monitoring data logic model is converted into the Metric in the opentsdb. And storing the acquired data values in a Tsdb database according to tags, metrics and time series.
As shown in fig. 6, the step of the alarm calculation module obtaining the monitoring alarm result according to the alarm rule and the data logic model operation includes: the alarm calculation module converts the data logic model into a monitoring request and obtains the data of the monitored object from the monitoring request; and the alarm calculation module calculates the data of the monitored object by using the alarm rule to obtain a monitoring alarm result.
In this embodiment, the step of entering the Kafka queue by the plurality of alarm rules and the data logic model includes two steps: a plurality of alarm rules and a data logic model are entered into a Kafka queue from a plurality of alarm rule modules. A plurality of alarm rules and data logic models are loaded from the MYSQL database into the Kafka queue.
The alarm calculation module is used as a consumer of the Kafka queue, the alarm rule is totally consumed from the Kafka queue and stored in the memory of the process, and then the timer task carries out timing calculation on the alarm data according to the alarm calculation period in the alarm rule. The Kafka queue maintains a complete set of alarm rules of the whole system, and each alarm calculation module process consumes a subset of the alarm rules, so that the load of each alarm calculation module is greatly reduced, and the healthy and stable operation of the system can be ensured.
In this embodiment, the data logic model includes: multiple namespaces, multiple metrics, and multiple dimensions. Wherein, the plurality of metrics are respectively subordinate to the plurality of namespaces, and the plurality of dimension are respectively subordinate to the plurality of metrics. The number of namespace, metric and dimension is determined by the monitoring request of the monitored object. Referring to FIG. 7, for example, the monitored object is a set of Internet of things systems, where there are two subsystems, Subsystem-1 and Subsystem-2, and Subsystem-1 and Subsystem-2 are two different namespaces. If the monitoring system needs to monitor the temperature and the equipment rotating speed in the system of the Internet of things, the meters are temperature and speed, the equipment 1 in the monitoring area 1 is arranged in the Subsystem-1, at the moment, the meters in the Subsystem-1 correspond to two-dimensional dimensions which can be named as area-1 and device-1 respectively, the Subsystem-2 only has one area, and the system monitors the data on the equipment 2, so that only one dimension of the dimension is device-2.
The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, and any simple changes or equivalent substitutions of the technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. An automatically scheduled distributed data monitoring system, comprising:
the Nginx reverse proxy server is in communication connection with the monitored object through an HTTP (hyper text transport protocol), an MQTT (multiple quantum QTT) protocol and a WebSocket protocol;
the plurality of alarm rule modules are deployed by adopting a Master-Slave framework, and are connected with the Nginx reverse proxy server through an Http protocol;
the Kafka queue is connected with the plurality of alarm rule modules through an Http protocol;
the alarm calculation modules are connected in a Tcp protocol mode through a Zookeeper server; each alarm calculation module is in communication connection with the Kafka queue.
2. The automatically scheduled distributed data monitoring system of claim 1, wherein the Master-slave architecture comprises:
the slave nodes are in communication connection with the Zookeeper server; for computing services;
the Master node is in communication connection with the Zookeeper server; the method is mainly used for managing a plurality of slave nodes and simultaneously bearing computing services.
3. The automatically scheduled distributed data monitoring system of claim 1, further comprising:
the MYSQL database is in communication connection with the plurality of alarm rule modules;
the time sequence database OpenTsdb is in communication connection with the Kafka queue and the plurality of alarm calculation modules;
and the centralized cache module is in communication connection with the Kafka queue and the plurality of alarm calculation modules.
4. An automatically scheduled distributed data monitoring method is characterized by comprising the following steps:
the Nginx reverse proxy server acquires a monitoring request of a monitored object and stores the monitoring request in a time sequence database OpenTsdb;
when the alarm rule module is down, the alarm rule modules are redeployed; otherwise, the plurality of alarm rule modules respectively formulate alarm rules and multidimensional data logic models according to the monitoring request of the monitored object, and store the plurality of alarm rules and the data logic models in the MYSQL database;
the Kafka queue decouples the plurality of alarm rule modules and the plurality of alarm calculation modules;
a plurality of alarm rules and data logic models enter a Kafka queue;
the Zookeeper server coordinates a plurality of alarm calculation modules to respectively acquire an alarm rule and a data logic model which are respectively maintained from the KafKa queue, and the plurality of alarm calculation modules respectively calculate according to the alarm rule and the data logic model to parallelly acquire a plurality of monitoring alarm results;
and the monitoring alarm result is pushed to the monitored object or a third-party system through the kafka queue.
5. The method according to claim 4, wherein the step of relocating the plurality of alarm rule modules when the alarm rule modules are down comprises:
when a detection cluster on an alarm rule module of a Master node detects that a slave is down;
an alarm rule module of the Master node obtains the IP address of a slave alarm rule module in downtime from a Zookeeper server;
the alarm rule module of the Master node acquires an alarm rule corresponding to the IP address of the slave alarm rule module in downtime from the cache data of the centralized cache module;
an alarm rule module of the Master node sends an alarm rule corresponding to the IP address of the down machine to a Kafka queue;
and an alarm rule module of the Master node and a survivor slave alarm rule module form a new Master-slave architecture.
6. The method according to claim 4, wherein the step of relocating the plurality of alarm rule modules when the alarm rule modules are down comprises:
when a detection cluster on an alarm rule module of any slave node detects that an alarm rule module of a Master node is down;
the Zookeeper server selects a slave alarm rule module as a new Master alarm rule module; the new Master alarm rule module obtains the IP address of the shutdown Master alarm rule module from the Zookeeper server;
the new Master alarm rule module acquires the alarm rule corresponding to the IP address of the shutdown Master alarm rule module from the cache;
the new Master alarm rule module sends alarm rules corresponding to the IP address of the shutdown Master to a Kafka queue;
and the alarm rule module of the new Master node and the plurality of slave alarm rule modules form a new Master-slave architecture.
7. The distributed data monitoring method of claim 4, wherein the step of obtaining a plurality of monitoring alarm results in parallel by a plurality of alarm calculation modules according to the alarm rules and the data logic model operation comprises:
the alarm calculation module converts the data logic model into a monitoring request and obtains the data of the monitored object from the monitoring request;
and the alarm calculation module calculates the data of the monitored object by using the alarm rule to obtain a monitoring alarm result.
8. The automatically scheduled distributed data monitoring method of claim 4, wherein the step of entering the plurality of alarm rules and data logic models into the Kafka queue comprises:
a plurality of alarm rules and a data logic model are entered into a Kafka queue from a plurality of alarm rule modules.
9. The automatically scheduled distributed data monitoring method of claim 4, wherein the step of entering the plurality of alarm rules and data logic models into the KafKa queue comprises:
a plurality of alarm rules and data logic models are loaded from the MYSQL database into the Kafka queue.
10. The automatically scheduled distributed data monitoring method of claim 4, wherein the data logic model comprises:
a plurality of namespaces, wherein the number of the namespaces is determined by a monitoring request of a monitored object;
a plurality of meters respectively subordinate to the plurality of namespaces; the number of meters is determined by a monitoring request of a monitored object;
a plurality of dimensions respectively subordinate to the plurality of metrics; the number of sessions is determined by the monitoring request of the monitored object.
CN202210314581.0A 2022-03-29 2022-03-29 Automatic scheduling distributed data monitoring system and method Active CN114422339B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210314581.0A CN114422339B (en) 2022-03-29 2022-03-29 Automatic scheduling distributed data monitoring system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210314581.0A CN114422339B (en) 2022-03-29 2022-03-29 Automatic scheduling distributed data monitoring system and method

Publications (2)

Publication Number Publication Date
CN114422339A true CN114422339A (en) 2022-04-29
CN114422339B CN114422339B (en) 2022-07-01

Family

ID=81262784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210314581.0A Active CN114422339B (en) 2022-03-29 2022-03-29 Automatic scheduling distributed data monitoring system and method

Country Status (1)

Country Link
CN (1) CN114422339B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018103315A1 (en) * 2016-12-09 2018-06-14 上海壹账通金融科技有限公司 Monitoring data processing method, apparatus, server and storage equipment
CN108234199A (en) * 2017-12-20 2018-06-29 中国联合网络通信集团有限公司 Monitoring method, apparatus and system based on Kafka
CN108270618A (en) * 2017-12-30 2018-07-10 杭州华为数字技术有限公司 Alert the method, apparatus and warning system of judgement
CN109709389A (en) * 2018-11-30 2019-05-03 珠海派诺科技股份有限公司 For electric instrument distributed mass real time data sampling alarm method and system
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
CN111190798A (en) * 2020-01-03 2020-05-22 苏宁云计算有限公司 Service data monitoring and warning device and method
CN111190794A (en) * 2019-12-30 2020-05-22 天津浪淘科技股份有限公司 Operation and maintenance monitoring and management system
WO2021184586A1 (en) * 2020-03-18 2021-09-23 平安科技(深圳)有限公司 Private cloud monitoring method and apparatus based on non-flat network, and computer device and storage medium
CN113448812A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Monitoring alarm method and device under micro-service scene

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018103315A1 (en) * 2016-12-09 2018-06-14 上海壹账通金融科技有限公司 Monitoring data processing method, apparatus, server and storage equipment
CN108234199A (en) * 2017-12-20 2018-06-29 中国联合网络通信集团有限公司 Monitoring method, apparatus and system based on Kafka
CN108270618A (en) * 2017-12-30 2018-07-10 杭州华为数字技术有限公司 Alert the method, apparatus and warning system of judgement
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
CN109709389A (en) * 2018-11-30 2019-05-03 珠海派诺科技股份有限公司 For electric instrument distributed mass real time data sampling alarm method and system
CN111190794A (en) * 2019-12-30 2020-05-22 天津浪淘科技股份有限公司 Operation and maintenance monitoring and management system
CN111190798A (en) * 2020-01-03 2020-05-22 苏宁云计算有限公司 Service data monitoring and warning device and method
WO2021184586A1 (en) * 2020-03-18 2021-09-23 平安科技(深圳)有限公司 Private cloud monitoring method and apparatus based on non-flat network, and computer device and storage medium
CN113448812A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Monitoring alarm method and device under micro-service scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郝鹏海等: "基于Kafka和Kubernetes的云平台监控告警***", 《计算机***应用》 *

Also Published As

Publication number Publication date
CN114422339B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN112231075B (en) Cloud service-based server cluster load balancing control method and system
WO2020147336A1 (en) Micro-service full-link monitoring system and method
CN112015753B (en) Monitoring system and method suitable for containerized deployment of open source cloud platform
CN112202617B (en) Resource management system monitoring method, device, computer equipment and storage medium
US20210182307A1 (en) System and methods for autonomous monitoring and recovery in hybrid energy management
Sathyamoorthy et al. Energy efficiency as an orchestration service for mobile Internet of Things
CN114513542A (en) Production equipment control method and device, computer equipment and storage medium
CN114090378A (en) Custom monitoring and alarming method based on Kapacitor
CN105471938B (en) Server load management method and device
CN105467907A (en) Automatic inspection system and method
CN114422339B (en) Automatic scheduling distributed data monitoring system and method
CN112817992B (en) Method, apparatus, electronic device and readable storage medium for executing change task
CN113342625A (en) Data monitoring method and system
CN112486776A (en) Cluster member node availability monitoring equipment and method
CN115665173B (en) MQ-based Websocket communication method, system and storage medium
CN116383207A (en) Data tag management method and device, electronic equipment and storage medium
WO2023273461A1 (en) Robot operating state monitoring system, and method
Rovnyagin et al. Cloud computing architecture for high-volume monitoring processing
WO2020037634A1 (en) Information monitoring system and method for industrial control device network, computer readable storage medium, and computer device
CN110505301A (en) A kind of aeronautical manufacture workshop industry big data processing frame
CN114861909A (en) Model quality monitoring method and device, electronic equipment and storage medium
CN107844401A (en) Data monitoring method, device and computer-readable storage medium
WO2018083710A2 (en) An improved management and internetworking of devices to collect and exchange data without requiring interaction
Rohm et al. Enabling resource-awareness for in-network data processing in wireless sensor networks
CN113656239A (en) Monitoring method and device for middleware and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A distributed data monitoring system and method for automatic scheduling

Effective date of registration: 20230109

Granted publication date: 20220701

Pledgee: Xi'an innovation financing Company limited by guarantee

Pledgor: Xi'an Tali Technology Co.,Ltd.

Registration number: Y2023610000024

PE01 Entry into force of the registration of the contract for pledge of patent right