CN111722980B - Data acquisition system and method - Google Patents

Data acquisition system and method Download PDF

Info

Publication number
CN111722980B
CN111722980B CN202010529803.1A CN202010529803A CN111722980B CN 111722980 B CN111722980 B CN 111722980B CN 202010529803 A CN202010529803 A CN 202010529803A CN 111722980 B CN111722980 B CN 111722980B
Authority
CN
China
Prior art keywords
node
telegraf
leader
leader node
election event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010529803.1A
Other languages
Chinese (zh)
Other versions
CN111722980A (en
Inventor
徐晶
李琳
张晓颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN202010529803.1A priority Critical patent/CN111722980B/en
Publication of CN111722980A publication Critical patent/CN111722980A/en
Application granted granted Critical
Publication of CN111722980B publication Critical patent/CN111722980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides a data acquisition system and a data acquisition method, wherein the system comprises the following steps: the system comprises a plurality of servers, wherein Telegraf nodes are deployed on each server and are used for starting corresponding Telegraf subprocesses to acquire data; the ZooKeeper distributed framework module comprises a plurality of temporary fields registered on the ZooKeeper distributed framework module by the Telegraf node; the Redis database comprises an election event blocking queue for acquiring trigger messages of election events among the plurality of Telegraf nodes. According to the data acquisition system provided by the embodiment of the invention, the Telegraf nodes are respectively deployed on the servers, and when the current Leader node is abnormal, a new Leader node is selected for data acquisition, so that the loss of real-time monitoring data is avoided.

Description

Data acquisition system and method
Technical Field
The invention relates to the field of big data, in particular to a data acquisition system and method.
Background
Telegraf is a real-time collection tool for server-side end-user infrastructure-related performance index data. In the prior art, a Telegraf process is usually deployed on a certain server, and various basic facilities and middleware distributed on a plurality of servers are monitored.
If the server where the Telegraf process is located is down or has network faults and other problems, the Telegraf process cannot collect relevant data in real time, and the data are permanently lost because the data are not collected in real time, so that the subsequent data analysis work is influenced.
Disclosure of Invention
Aiming at least one technical problem existing in the prior art, the embodiment of the invention provides a data acquisition system and a data acquisition method.
In a first aspect, an embodiment of the present invention provides a data acquisition system, including a plurality of servers, a ZooKeeper distributed framework module, and a Redis database, where:
the plurality of servers are all provided with Telegraf nodes; the data acquisition system comprises a Telegraf node which is only used as a Leader node, and the Leader node is used for starting a Telegraf subprocess to acquire data;
the Telegraf node comprises an event monitoring node and a message acquisition node; the event monitoring node is used for monitoring whether the triggering condition of the Leader node election event is met or not, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is met; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
the message obtaining node is configured to, if a message triggering the Leader node election event is obtained from the election event blocking queue, start a corresponding Telegraf sub-process as an elected Leader node to perform data collection;
the ZooKeeper distributed framework module comprises a temporary field registered on the ZooKeeper distributed framework module by the Telegraf node; the temporary field is used as a basis for the event monitoring node to monitor whether the triggering condition of the Leader node election event is met;
the Redis database comprises an election event blocking queue and is used for storing a message triggering the Leader node election event.
Optionally, the Redis database further includes a distributed lock for determining a Telegraf node that sends a trigger message of the election event to the election event blocking queue.
Optionally, the ZooKeeper distributed framework module further includes a Leader field, configured to store identification information of a server where the Leader node is located.
In a second aspect, an embodiment of the present invention provides a data acquisition method, which is applied to the data acquisition system in the first aspect, including:
monitoring whether a triggering condition of a Leader node election event is met or not, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is met; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
if the message triggering the Leader node election event is obtained from the election event blocking queue, the selected node serving as the Leader node is used as an elected node, and a corresponding Telegraf subprocess is started to acquire data.
Optionally, the triggering condition of the Leader node election event is specifically:
and the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last report receiving time and the current time stored in the temporary field registered by the Leader node on the ZooKeeper distributed framework module exceeds a preset threshold.
Optionally, the sending a message for triggering the Leader node election event to the election event blocking queue includes:
a plurality of Telegraf nodes contend for a distributed lock of the rediss database;
and competing for the Telegraf node of the distributed lock, and sending a message triggering the Leader node election event to an election event blocking queue of the Redis database.
Optionally, after the selected node serving as the Leader node starts the corresponding Telegraf sub-process to collect data, the method further includes:
and if the Telegraf subprocess starts abnormally, triggering the Leader node election event again.
Optionally, the method further comprises:
and the ZooKeeper distributed framework module sends an event notification for closing a Telegraf sub-process corresponding to the Leader node.
Optionally, the method further comprises:
and determining one Telegraf node as a Leader node in the plurality of Telegraf nodes to acquire data by setting a Leader field in the ZooKeeper distributed framework module.
Optionally, the determining, by setting a leader field in the ZooKeeper distributed framework module, one Telegraf node among the plurality of Telegraf nodes to perform data collection includes:
setting a leader field in a ZooKeeper distributed framework module as identification information of a server where one Telegraf node in a plurality of Telegraf nodes is located;
and the Telegraf node reads a leader field in the ZooKeeper distributed framework module, and if the leader field is identification information of a server where the Telegraf node is located, the Telegraf node starts a corresponding Telegraf subprocess to acquire data.
According to the data acquisition system provided by the embodiment of the invention, the Telegraf nodes are respectively deployed on the servers, and when the current Leader node is abnormal, a new Leader node is selected for data acquisition, so that the loss of real-time monitoring data is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data acquisition system according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data acquisition method according to an embodiment of the invention;
FIG. 3 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention;
fig. 4 is a schematic flow chart of a data acquisition method according to an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic structural diagram of a data acquisition system according to an embodiment of the present invention, as shown in fig. 1, where the system includes:
a plurality of servers 110, each server is deployed with a Telegraf node 111, and only one Telegraf node serving as a Leader node is included in the Telegraf nodes 111 included in the data acquisition system, and the Leader node is used for starting a Telegraf subprocess 112 to acquire data;
the Telegraf node 111 comprises an event monitoring node and a message acquisition node; the event monitoring node is used for monitoring whether the triggering condition of the Leader node election event is met or not, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is met; the Leader node election event is used for triggering a Telegraf node 111 contained in the data acquisition system to elect a Telegraf node as a Leader node;
the message obtaining node is configured to, if a message triggering the Leader node election event is obtained from the election event blocking queue, start a corresponding Telegraf sub-process 112 to perform data collection as an elected selected node serving as a Leader node;
a ZooKeeper distributed framework module 120 comprising a plurality of temporary fields 121 on which the Telegraf node registers; the temporary field is used as a basis for the event monitoring node to monitor whether the triggering condition of the Leader node election event is met;
redis database 130 includes election event blocking queue 131 for storing messages triggering the Leader node election event.
Specifically, the invention is applied to a multi-server environment, and Telegraf nodes 111 are deployed on each server 110, i.e. a plurality of mutually independent Telegraf nodes 111 exist in the multi-server environment. One of the important functions of the Telegraf node 111 is to start the corresponding Telegraf sub-process 112 for data acquisition. More generally, the Telegraf node 111 may be configured to manage a lifecycle of the corresponding Telegraf sub-process 112, e.g., start, stop, etc., the Telegraf sub-process 112.
The Telegraf subprocess 112 is a process for monitoring performance indexes related to middleware by aiming at an operating system server end, provides a monitoring data acquisition plug-in set for opening a box for a plurality of basic facilities and middleware of the operating system server end, and can configure the plug-in set as required by a user, and adjust parameters for various plug-ins, so that the Telegraf subprocess can be put into a formal environment for use. The Telegraf node 111 in the embodiment of the present invention may be used as a parent process of the Telegraf child process 112. In particular, the Telegraf node 111 is typically implemented by a lightweight script such as Bash or Python.
Specifically, the Telegraf node 111 further has a function of judging a triggering condition of an election event among the Telegraf nodes 111 and monitoring a triggering message of the election event from the Redis database 130. The election event among the plurality of Telegraf nodes 111 refers to: only one Telegraf node starts a Telegraf sub-process to be responsible for data acquisition among the plurality of Telegraf nodes 111 in the embodiment of the invention, namely, a Leader node in the embodiment of the invention needs to elect a new Telegraf node to be the only Leader node to replace the original Leader node to continue data acquisition tasks under the condition that various anomalies occur in the operation of the Leader node.
Further, in the case that various anomalies occur in the operation of the Leader node, the Telegraf node 111 determines whether the triggering condition of the election event is satisfied. When the triggering condition of the election event is met, the triggering message of the election event occurs in the Redis database 130 in the embodiment of the invention, and the Telegraf node 111 monitors the triggering message of the election event in the Redis database 130 in real time, so that the triggering of the election event can be known timely, and the election event is participated.
Specifically, the Redis database 130 in the embodiment of the present invention is used as a lightweight message queue, has better performance under the condition of smaller communication data volume based on the working mode of its shared memory, and is suitable for being used as a target container for election event notification and monitoring processing in the embodiment of the present invention. The Redis database 130 specifically includes an election event blocking queue 131, where the election event blocking queue 131 is configured to obtain trigger messages of election events between the plurality of Telegraf nodes, that is, the Telegraf node sends the trigger messages of the election events to the election event blocking queue 131.
To ensure the uniqueness of the trigger message, only one trigger message of the election event is needed when the election event is triggered. Therefore, in order to avoid that the plurality of Telegraf nodes send the triggering message of the election event to the election event blocking queue at the same time, the dis database 130 further includes the distributed lock 132, and the plurality of Telegraf nodes need to preempt the distributed lock, so that the Telegraf nodes preempted to the distributed lock can send the triggering message to the election event blocking queue, thereby realizing uniqueness of the triggering message.
The data acquisition system in the embodiment of the present invention further includes a ZooKeeper distributed framework module 120, which is used as a program coordination service, where the ZooKeeper distributed framework module 120 in the embodiment of the present invention plays a role of a target container for state synchronization in the system, and implements its function by the following fields:
an initialization field for identifying whether the system has completed an initialization procedure;
a Leader field, configured to store identification information of a server where the Leader node is located, for example, a hostname or an ip address of the server;
temporary fields, i.e. a plurality of temporary fields registered by the Telegraf node on the ZooKeeper distributed framework module. If the Telegraf node is a Leader node, the temporary field registered by the Leader node is used for storing the last reporting time of the Leader node; if the Telegraf node is not a Leader node, the temporary field registered by the Telegraf node is used for storing the health state of the Telegraf node, and when the value is true, the node is indicated to run normally at present and has the right of being elected as the Leader node, otherwise, the value is false.
Further, in the embodiment of the present invention, the last reporting time of the Leader node refers to the latest time when the Leader node reports its collected data with a preset frequency in the process of collecting data, and the last reporting time is the time when the Leader node finally collects data normally and reports the data, which characterizes the health status of the Leader node.
According to the data acquisition system provided by the embodiment of the invention, the Telegraf nodes are respectively deployed on the servers, and when the current Leader node is abnormal, a new Leader node is selected for data acquisition, so that the loss of real-time monitoring data is avoided.
On the basis of the foregoing embodiment, fig. 2 is a flowchart of a data acquisition method according to an embodiment of the present invention, where the method is applied to the data acquisition system according to the foregoing embodiment, as shown in fig. 2, and the method includes:
s201, monitoring whether a triggering condition of a Leader node election event is met, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is met; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
specifically, the applied scenario in the embodiment of the present invention is in the data acquisition system of the embodiment of the present invention, and the Telegraf node deployed on a certain server in the system performs a data acquisition task, that is, a Leader node, by adopting a corresponding Telegraf sub-process.
In order to prevent the server from being down or network failure and the like to cause data acquisition failure, the embodiment of the invention needs to select a new Leader node to acquire data. First, it is necessary to control each candidate node to determine whether a trigger condition of an election event between a plurality of Telegraf nodes is satisfied. The candidate nodes are other Telegraf nodes except a Leader node in the plurality of Telegraf nodes. Meanwhile, the Leader node, like other candidate nodes, actually has the function of judging the triggering condition of the election event, however, a new Leader node to be elected by the election event is usually generated in the candidate nodes.
Further, the triggering conditions of the election event in the embodiment of the present invention at least include two: the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last report receiving time and the current time stored by the Leader node in the temporary field registered on the ZooKeeper distributed framework module exceeds a preset threshold. If one of the two conditions is met, which represents that the triggering condition of the election event is reached, a new Leader node needs to be elected.
Specifically, for the first trigger condition, since each Telegraf node including the Leader node in the embodiment of the present invention registers a corresponding temporary field in the ZooKeeper distributed framework module, if the state of the temporary field corresponding to the Leader node changes from present to absent, according to the event notification mechanism of the ZooKeeper, it is indicated that the Leader node does not exist in the system, and possible reasons include: the server where the Leader node is located is down, the Leader node is out of connection with the ZooKeeper distributed framework module due to network failure, and the like, and at this time, the candidate node can receive a notification event from the ZooKeeper distributed framework module.
Specifically, for the second trigger condition, since the Leader node stores the last report time in the temporary field corresponding to the ZooKeeper distributed framework module, the last report time is updated with a preset frequency along with the progress of the data acquisition task. Meanwhile, the candidate node starts a corresponding timing checking thread, checks the last reporting time in the ZooKeeper distributed framework module, and if the difference value between the last reporting time and the current time exceeds a preset threshold value, the problem that the Telegraf sub-process managed by the Leader node reports that network congestion, packet loss and the like exist is also solved, and the situation that an election event needs to be triggered is also included.
Further, after the candidate node determines that the triggering condition of the election event among the plurality of Telegraf nodes is satisfied, the triggering message of the election event needs to be sent to an election event blocking queue of the Redis database. The election event blocking queue is a message container for acquiring triggering messages of election events among the plurality of Telegraf nodes, namely the Telegraf node sends the triggering messages of the election events to the election event blocking queue, and a sender can be a Leader node or a candidate node.
To ensure the uniqueness of the trigger message, only one trigger message of the election event is needed when the election event is triggered. Therefore, in order to avoid that a plurality of Telegraf nodes send triggering information of an election event to an election event blocking queue at the same time, the Redis database also comprises a distributed lock, the plurality of Telegraf nodes need to preempt the distributed lock, and the Telegraf nodes preempted to the distributed lock can send the triggering information to the election event blocking queue, so that the uniqueness of the triggering information is realized.
S202, if the message triggering the Leader node election event is obtained from the election event blocking queue, the message is used as an elected selected node serving as a Leader node, and a corresponding Telegraf subprocess is started to acquire data.
Specifically, after a Telegraf node sends a trigger message of an election event to an election event blocking queue, only one Telegraf node in the plurality of Telegraf nodes can monitor the trigger message according to the exclusivity of the monitoring mode of the trigger message. That is, after one of the Telegraf nodes monitors the trigger message, the other nodes cannot monitor the trigger message any more, that is, only one node obtains the vote obtained in the election event, and becomes the selected node, that is, the new Leader node.
Specifically, the selected node is used as a new Leader node to replace the original Leader node to continue the process data acquisition. The specific steps may be that the selected node starts a corresponding Telegraf sub-process, and then uses the Telegraf sub-process corresponding to the selected node to collect data.
According to the data acquisition method provided by the embodiment of the invention, the new Leader node is selected for data acquisition when the current Leader node is abnormal, so that the loss of real-time monitoring data is avoided.
On the basis of any of the above embodiments, fig. 3 is a flowchart of a data acquisition method according to an embodiment of the present invention, and as shown in fig. 3, the method specifically includes a complete flow of an initialization phase, including:
s301, manually and randomly electing;
specifically, under the condition that a server, an infrastructure and middleware in the whole system environment are ready, network operation and maintenance personnel can designate a Telegraf node in any one server as a Leader node by manually setting a Leader field in a ZooKeeper distributed framework module.
The Leader field is a ZooKeeper distributed framework module, and is used for storing identification information of a server where the Leader node is located, for example, a host name or an ip address of the server. Each Telegraf node can acquire the information of the Leader node through the information in the field.
S302, starting a Telegraf node;
specifically, an initialization field in the ZooKeeper distributed framework module is used to identify whether the system has completed the initialization procedure. Thus, at the beginning of an initialization task, the initialization field in the ZooKeeper distributed framework module needs to be manually set to false to identify that the initialization task is not complete.
Meanwhile, the Telegraf nodes deployed in different servers are generally script programs implemented by lightweight scripts such as Bash or Python. Thus, to start script programs for multiple Telegraf nodes at the same time during the initialization phase, telegraf nodes may be started in batch by means of an automated operation and maintenance tool such as an secure.
S303, registering a temporary field;
specifically, the ZooKeeper distributed framework module is a target container for state synchronization in a system, wherein temporary fields are registered by Telegraf nodes on the ZooKeeper distributed framework module, so that state synchronization of different Telegraf nodes is realized. Therefore, after the Telegraf node is started, the Telegraf node registers the corresponding temporary field in the ZooKeeper distributed framework module according to the identification information such as the host name or the ip address of the server where the Telegraf node is located.
S304, obtaining Leader information;
specifically, since the Leader field in the ZooKeeper distributed framework module is manually set by the network operator, each Telegraf node does not know whether itself is a Leader node set in the initialization task. Therefore, each Telegraf node can read the Leader information from the ZooKeeper to judge.
S305, starting a Telegraf subprocess by a Leader node;
specifically, after the Telegraf node reads the Leader information from the ZooKeeper, if the Telegraf node finds that the Telegraf node is the Leader node, the Telegraf node starts a corresponding Telegraf sub-process, so that the data acquisition work of the Telegraf node appointed by the network operation and maintenance personnel in the initialization task is realized.
S306, monitoring an election event message queue;
specifically, when the initialization task designates the current Leader node to collect data, preparation is also needed for the election event which may be started later, each Telegraf node needs to know when the election event occurs in time, and therefore each Telegraf node needs to monitor the triggering message of the election event to realize.
Specifically, each Telegraf node starts a consumer thread, and specifically may be an election event blocking queue that monitors that the current message volume in Redis is empty by using the LPOP/BRPOP mode in Redis. Once the triggering message of the election event exists in the election event blocking queue, the triggering message is monitored by each Telesraf node.
S307, initial state checking;
specifically, the Telegraf subprocess is a process directly used for data acquisition, and in order to ensure that the Telegraf subprocess has no abnormal phenomenon after being started, whether the Telegraf subprocess runs normally is judged by scanning a log file of the subprocess after the Leader node starts the Telegraf subprocess.
S308, updating the temporary field state;
specifically, after the Telegraf node registers with the ZooKeeper distributed framework module to obtain the corresponding temporary field, the state of the temporary field needs to be updated. If the Telegraf node is a Leader node, the temporary field registered by the Leader node is used for storing the last reporting time of the Leader node; if the Telegraf node is not a Leader node, the temporary field registered by the Telegraf node is used for storing the health state of the Telegraf node, and when the value is true, the node is indicated to run normally at present and has the right of being elected as the Leader node, otherwise, the value is false.
Therefore, in the step of updating the temporary field state, the Leader node can update the health state of the corresponding temporary node on the ZooKeeper distributed frame module to true after the initial state is checked; for other nodes, only judging that the monitoring of the election event message queue is successful, and setting the health state as true; for the case that each node cannot monitor normally and the Leader node cannot start the telegram subprocess normally, the corresponding health state is set to false.
S309, detecting and executing abnormal alarm.
Specifically, after a preset event after each Telegraf node is started, for example, 30 seconds, the Telegraf node which obtains the distributed lock starts an abnormal alarm monitoring thread to scan the health value of each temporary field of the ZooKeeper distributed frame module, if false is found, an alarm is triggered, and a system environment problem is required to be manually interfered, otherwise, an initialization field of the ZooKeeper is set to true, and the initialization of the data acquisition system is completed by identification.
According to the data acquisition method provided by the embodiment of the invention, the Telegraf nodes responsible for data acquisition are designated through the initialization task, and the state synchronization and the message monitoring mechanism are realized among a plurality of Telegraf nodes, so that the data acquisition task is ensured to normally run, and the loss of real-time monitoring data can be avoided.
On the basis of any one of the above embodiments, fig. 4 is a flowchart of a data collection method according to an embodiment of the present invention, as shown in fig. 4, where the method specifically includes a complete process of an election event phase, including:
s401, judging whether an election event is triggered or not;
in order to prevent the problems of downtime or network failure and the like of a server where an original Leader node is located from causing data acquisition failure, the embodiment of the invention needs to select a new Leader node to acquire data under the condition. First, it is necessary to control each candidate node to determine whether a trigger condition of an election event between a plurality of Telegraf nodes is satisfied. The candidate nodes are other Telegraf nodes except a Leader node in the plurality of Telegraf nodes. Meanwhile, the Leader node, like other candidate nodes, actually has the function of judging the triggering condition of the election event, however, a new Leader node to be elected by the election event is usually generated in the candidate nodes.
Further, the triggering conditions of the election event in the embodiment of the present invention at least include two: the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last report receiving time and the current time stored by the Leader node in the temporary field registered on the ZooKeeper distributed framework module exceeds a preset threshold. If one of the two conditions is met, which represents that the triggering condition of the election event is reached, a new Leader node needs to be elected.
Specifically, for the first trigger condition, since each Telegraf node including the Leader node in the embodiment of the present invention registers a corresponding temporary field in the ZooKeeper distributed framework module, if the state of the temporary field corresponding to the Leader node changes from present to absent, according to the event notification mechanism of the ZooKeeper, it is indicated that the Leader node does not exist in the system, and possible reasons include: the server where the Leader node is located is down, the Leader node is out of connection with the ZooKeeper distributed framework module due to network failure, and the like, and at this time, the candidate node can receive a notification event from the ZooKeeper distributed framework module.
Specifically, for the second trigger condition, since the Leader node stores the last report time in the temporary field corresponding to the ZooKeeper distributed framework module, the last report time is updated with a preset frequency along with the progress of the data acquisition task. Meanwhile, the candidate node starts a corresponding timing checking thread, checks the last reporting time in the ZooKeeper distributed framework module, and if the difference value between the last reporting time and the current time exceeds a preset threshold value, the problem that the Telegraf sub-process managed by the Leader node reports that network congestion, packet loss and the like exist is also solved, and the situation that an election event needs to be triggered is also included.
S402, triggering an election event to a Redis blocking queue;
specifically, after the candidate node determines that the triggering condition of the election event among the plurality of Telegraf nodes is satisfied, the triggering message of the election event needs to be sent to an election event blocking queue of the Redis database. The election event blocking queue is a message container for acquiring triggering messages of election events among the plurality of Telegraf nodes, namely the Telegraf node sends the triggering messages of the election events to the election event blocking queue, and a sender can be a Leader node or a candidate node.
To ensure the uniqueness of the trigger message, only one trigger message of the election event is needed when the election event is triggered. Therefore, in order to avoid that a plurality of Telegraf nodes send triggering information of an election event to an election event blocking queue at the same time, the Redis database also comprises a distributed lock, the plurality of Telegraf nodes need to preempt the distributed lock, the Telegraf nodes which preempt the distributed lock can send the triggering information to the election event blocking queue, and the distributed lock is released after the triggering information is sent, so that the uniqueness of the triggering information is realized.
S403, nodes meeting the candidate qualification contend for the vote;
specifically, after a Telegraf node sends a trigger message of an election event to an election event blocking queue, only one Telegraf node in the plurality of Telegraf nodes can monitor the trigger message according to the exclusivity of the monitoring mode of the trigger message. That is, after one of the Telegraf nodes monitors the trigger message, the other nodes cannot monitor the trigger message any more, that is, only one node obtains the vote obtained in the election event, and becomes the selected node, that is, the new Leader node.
S404, starting a Telegraf subprocess;
specifically, after the Telerraf node is selected as a new Leader node, the Telerraf node starts a corresponding Telerraf subprocess, and under the condition that the original Leader node is abnormal, the Telerraf subprocess in the original Leader node is replaced to perform data acquisition work, so that the continuous performance of the data acquisition work is ensured.
S405, discarding the candidate qualification;
specifically, after the selected node starts the Telegraf sub-process, it is necessary to detect whether the Telegraf sub-process starts successfully. If the Telegraf sub-process starts abnormally, for example, the Telegraf sub-process does not start normally or the Telegraf sub-process is found to have the condition that report can not be received normally through log scanning after starting, the Telegraf sub-process and the corresponding consumer thread are terminated, namely the candidate qualification is discarded. And then immediately generating a trigger message of a heart election event in an election event blocking queue in the Redis database, and simultaneously updating the health state of a temporary field of a corresponding Telegraf node to be false.
S406, synchronizing the Zookeeper state information;
specifically, if the selected node detects that the Telegraf sub-process is successfully started, the representative selected node will formally become a new Leader node to start data acquisition. Meanwhile, a leader field on the ZooKeeper distributed framework module is required to be updated, and the content of the leader field is updated to be the identification information of the server where the selected node is located. In addition, the selected node is used as a new Leader node, and the last reporting time in the corresponding temporary field also needs to be updated.
S407, the original Leader node smoothly returns;
specifically, after the selected node updates the Leader field on the ZooKeeper distributed framework module, the original Leader node receives an event notification from the ZooKeeper distributed framework module, and at the moment, the original Leader node checks that the Telegraf sub-process of the original Leader node is closed if the Telegraf sub-process is still running, so that smooth switching of the unique Telegraf sub-process in the system is realized.
S408, detecting and executing abnormal alarm;
specifically, after step S407, optionally, the step is executed, where the selected node is controlled to check that the number of temporary fields and the health value on the ZooKeeper distributed framework module are true, if a certain value is smaller than a specified threshold, the current abnormal state in the system is indicated, and an alarm is triggered to remind the operation and maintenance personnel to intervene in advance.
According to the data acquisition method provided by the embodiment of the invention, the new Leader node is selected for data acquisition when the current Leader node is abnormal, so that the loss of real-time monitoring data is avoided.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A data acquisition system comprising a plurality of servers, a ZooKeeper distributed framework module, and a Redis database, wherein:
the plurality of servers are all provided with Telegraf nodes; the data acquisition system comprises a Telegraf node which is only used as a Leader node, and the Leader node is used for starting a Telegraf subprocess to acquire data;
the Telegraf node comprises an event monitoring node and a message acquisition node; the event monitoring node is used for monitoring whether the triggering condition of the Leader node election event is met or not, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is met; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
the message obtaining node is configured to, if a message triggering the Leader node election event is obtained from the election event blocking queue, start a corresponding Telegraf sub-process as an elected Leader node to perform data collection;
the ZooKeeper distributed framework module comprises a temporary field registered on the ZooKeeper distributed framework module by the Telegraf node; the temporary field is used as a basis for the event monitoring node to monitor whether the triggering condition of the Leader node election event is met;
the Redis database comprises an election event blocking queue and is used for storing a message triggering the Leader node election event.
2. The data acquisition system of claim 1 wherein the Redis database further comprises a distributed lock for determining a Telegraf node that sends a trigger message for the election event to the election event occlusion queue.
3. The data acquisition system of claim 1, wherein the ZooKeeper distributed framework module further comprises a Leader field for storing identification information of a server where the Leader node is located.
4. A data acquisition method applied to the data acquisition system of any one of claims 1 to 3, comprising:
monitoring whether a triggering condition of a Leader node election event is met or not, and sending a message for triggering the Leader node election event to an election event blocking queue when the triggering condition of the Leader node election event is met; the Leader node election event is used for triggering a Telegraf node contained in the data acquisition system to elect a Telegraf node as a Leader node;
if the message triggering the Leader node election event is obtained from the election event blocking queue, the selected node serving as the Leader node is used as an elected node, and a corresponding Telegraf subprocess is started to acquire data.
5. The method for collecting data according to claim 4, wherein the trigger condition of the Leader node election event is specifically:
and the temporary field registered by the Leader node on the ZooKeeper distributed framework module does not exist, or the difference value between the last report receiving time and the current time stored in the temporary field registered by the Leader node on the ZooKeeper distributed framework module exceeds a preset threshold.
6. The method of claim 4, wherein the sending a message to the election event blocking queue that triggers a Leader node election event comprises:
a plurality of Telegraf nodes contend for a distributed lock of the rediss database;
and competing for the Telegraf node of the distributed lock, and sending a message triggering the Leader node election event to an election event blocking queue of the Redis database.
7. The method for data collection according to claim 4, wherein after the selected node as the Leader node starts the corresponding Telegraf sub-process to collect data, the method further comprises:
and if the Telegraf subprocess starts abnormally, triggering the Leader node election event again.
8. The method of data acquisition according to claim 4, further comprising:
and the ZooKeeper distributed framework module sends an event notification for closing a Telegraf sub-process corresponding to the Leader node.
9. The method of data acquisition according to claim 4, further comprising:
and determining one Telegraf node as a Leader node in the plurality of Telegraf nodes to acquire data by setting a Leader field in the ZooKeeper distributed framework module.
10. The method for data collection according to claim 9, wherein the determining a Telegraf node among a plurality of Telegraf nodes by setting a leader field in a ZooKeeper distributed frame module includes:
setting a leader field in a ZooKeeper distributed framework module as identification information of a server where one Telegraf node in a plurality of Telegraf nodes is located;
and the Telegraf node reads a leader field in the ZooKeeper distributed framework module, and if the leader field is identification information of a server where the Telegraf node is located, the Telegraf node starts a corresponding Telegraf subprocess to acquire data.
CN202010529803.1A 2020-06-11 2020-06-11 Data acquisition system and method Active CN111722980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010529803.1A CN111722980B (en) 2020-06-11 2020-06-11 Data acquisition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010529803.1A CN111722980B (en) 2020-06-11 2020-06-11 Data acquisition system and method

Publications (2)

Publication Number Publication Date
CN111722980A CN111722980A (en) 2020-09-29
CN111722980B true CN111722980B (en) 2023-10-20

Family

ID=72567968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010529803.1A Active CN111722980B (en) 2020-06-11 2020-06-11 Data acquisition system and method

Country Status (1)

Country Link
CN (1) CN111722980B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN107895009A (en) * 2017-11-10 2018-04-10 北京国信宏数科技有限责任公司 One kind is based on distributed internet data acquisition method and system
CN108512719A (en) * 2018-03-02 2018-09-07 南京易捷思达软件科技有限公司 A kind of Integrative resource monitoring system based on cloud platform of increasing income
CN109088908A (en) * 2018-06-06 2018-12-25 武汉酷犬数据科技有限公司 A kind of the distributed general collecting method and system of network-oriented
CN110247954A (en) * 2019-05-15 2019-09-17 南京苏宁软件技术有限公司 A kind of dispatching method and system of distributed task scheduling

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9596301B2 (en) * 2006-09-18 2017-03-14 Hewlett Packard Enterprise Development Lp Distributed-leader-election service for a distributed computer system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
CN107895009A (en) * 2017-11-10 2018-04-10 北京国信宏数科技有限责任公司 One kind is based on distributed internet data acquisition method and system
CN108512719A (en) * 2018-03-02 2018-09-07 南京易捷思达软件科技有限公司 A kind of Integrative resource monitoring system based on cloud platform of increasing income
CN109088908A (en) * 2018-06-06 2018-12-25 武汉酷犬数据科技有限公司 A kind of the distributed general collecting method and system of network-oriented
CN110247954A (en) * 2019-05-15 2019-09-17 南京苏宁软件技术有限公司 A kind of dispatching method and system of distributed task scheduling

Also Published As

Publication number Publication date
CN111722980A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
KR100978726B1 (en) Method and apparatus for implementing a predetermined operation in device menagement
EP0591345B1 (en) Method and system for monitoring a computer system
EP3761559A1 (en) Fault detection method, apparatus, and system
CN104699759B (en) A kind of data base automatic operation and maintenance method
CA2680702C (en) Remotely monitoring a data processing system via a communications network
US7213179B2 (en) Automated and embedded software reliability measurement and classification in network elements
US7058861B1 (en) Network model audit and reconciliation using state analysis
CN109960634B (en) Application program monitoring method, device and system
US6836798B1 (en) Network model reconciliation using state analysis
CN109743344B (en) Event storage method and device of comprehensive monitoring system based on rail transit
US20060085680A1 (en) Network monitoring method and apparatus
JP2010226181A (en) Network monitor and control apparatus
CN111698121B (en) SNMP trap alarm test method and related device
CN111722980B (en) Data acquisition system and method
US20050240372A1 (en) Apparatus and method for event detection
CN111953542A (en) System for ensuring gateway stable operation
CN107769957B (en) A kind of domain name system failure cause analysis method and device
JP4558662B2 (en) IP network path diagnosis device and IP network path diagnosis system
CN110290019B (en) Monitoring method and system
JP2006318036A (en) Obstacle monitoring system
KR100887874B1 (en) System for managing fault of internet and method thereof
CN116225834A (en) Alarm information sending method and device, storage medium and electronic device
Cisco Router and Network Monitoring Commands
CN113179180A (en) Basalt client disaster fault repairing method, basalt client disaster fault repairing device and basalt client disaster storage medium
US20100153543A1 (en) Method and System for Intelligent Management of Performance Measurements In Communication Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant