CN116800739A - Data acquisition method and device, electronic equipment and storage medium - Google Patents

Data acquisition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116800739A
CN116800739A CN202210992064.9A CN202210992064A CN116800739A CN 116800739 A CN116800739 A CN 116800739A CN 202210992064 A CN202210992064 A CN 202210992064A CN 116800739 A CN116800739 A CN 116800739A
Authority
CN
China
Prior art keywords
acquisition
node
collection
nodes
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210992064.9A
Other languages
Chinese (zh)
Inventor
黄刚
陈学平
范亚琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210992064.9A priority Critical patent/CN116800739A/en
Publication of CN116800739A publication Critical patent/CN116800739A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention is suitable for the technical field of computers, and provides a data acquisition method, a device, electronic equipment and a storage medium, wherein the data acquisition method comprises the following steps: calculating the weight of each acquisition node based on the hardware configuration information of each acquisition node in the at least two acquisition nodes; the weight represents the probability of the cluster being distributed to the acquisition node; the acquisition node is used for acquiring data of a corresponding cluster; determining the acquisition range of each acquisition node on the hash ring according to the weight; and distributing the acquisition nodes for at least one cluster according to the acquisition range of each acquisition node on the hash ring.

Description

Data acquisition method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data acquisition method, a data acquisition device, an electronic device, and a storage medium.
Background
The related art uses promethaus as a cluster monitoring scheme, where promethaus deploys an exporter for each cluster, and collects data of the cluster through the exporter. Related technologies need to allocate exporters for each cluster, so that exporter resources cannot be fully utilized, and load balancing of all exporters cannot be achieved at the same time.
Disclosure of Invention
In order to solve the above problems, embodiments of the present invention provide a data acquisition method, apparatus, electronic device, and storage medium, so as to at least solve the problem of low test efficiency in the related art.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a data acquisition method, where the method includes:
calculating the weight of each acquisition node based on the hardware configuration information of each acquisition node in the at least two acquisition nodes; the weight represents the probability of the cluster being distributed to the acquisition node; the acquisition node is used for acquiring data of a corresponding cluster;
determining the acquisition range of each acquisition node on the hash ring according to the weight;
and distributing the acquisition nodes for at least one cluster according to the acquisition range of each acquisition node on the hash ring.
In the above scheme, the distributing the collection nodes for at least one cluster according to the collection range of each collection node on the hash ring includes:
calculating hash values corresponding to all clusters, and determining the positions of all clusters on the hash ring according to the hash values;
and matching the positions of the clusters on the hash ring with the acquisition ranges of the acquisition nodes respectively, and distributing the acquisition nodes for the clusters according to the matching result.
In the above scheme, the allocating the collection node for each cluster according to the matching result includes:
if the matching result represents that the position of the cluster on the hash ring is positioned in the acquisition range of the first acquisition node, the cluster is distributed to the first acquisition node; the first acquisition node is any one of the at least two acquisition nodes; the first collection node is used for collecting data of the cluster.
In the above scheme, the determining the collection range of each collection node on the hash ring according to the weight includes:
determining the length of an arc line of the corresponding acquisition node on the hash ring according to the weight; the arc represents the acquisition range of the corresponding acquisition node on the hash ring; wherein the sum of the weights of all the at least two collection nodes is equal to 1.
In the above scheme, the method further comprises:
under the condition that the load value of the second acquisition node is monitored to be larger than a set value, a third acquisition node is established; the second acquisition node is any one of the at least two acquisition nodes;
calculating weights of the at least two acquisition nodes and the third acquisition node;
Determining the acquisition ranges of the at least two acquisition nodes and the third acquisition node on the hash ring according to the weight;
and reallocating the acquisition nodes for the at least one cluster according to the acquisition ranges of the at least two acquisition nodes and the third acquisition node on the hash ring.
In the above solution, after allocating the collection nodes to at least one cluster according to the collection ranges of the collection nodes on the hash ring, the method further includes:
determining the number of clusters acquired by a fourth acquisition node in batches each time according to hardware configuration information of the fourth acquisition node and the total number of the distributed clusters; the fourth collection node collects the total number of the distributed clusters in batches; the fourth collection node is any one of the at least two collection nodes.
In the above solution, the hardware configuration information includes: collecting the number of processor cores and the memory capacity of the node;
correspondingly, the calculating the weight of each collection node based on the hardware configuration information of each collection node in the at least two collection nodes comprises the following steps:
and respectively calculating the weight of each acquisition node based on the processor core number and the memory capacity of each acquisition node.
In a second aspect, an embodiment of the present invention provides a data acquisition device, including:
the computing module is used for computing the weight of each acquisition node based on the hardware configuration information of each acquisition node in the at least two acquisition nodes; the weight represents the probability of the cluster being distributed to the acquisition node; the acquisition node is used for acquiring data of a corresponding cluster;
the determining module is used for determining the acquisition range of each acquisition node on the hash ring according to the weight;
and the distribution module is used for distributing the collection nodes for at least one cluster according to the collection range of each collection node on the hash ring.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is configured to store a computer program, the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the steps of the data acquisition method provided in the first aspect of the embodiment of the present invention.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium comprising: the computer readable storage medium stores a computer program. The computer program when executed by a processor implements the steps of the data acquisition method as provided in the first aspect of the embodiment of the present invention.
The embodiment calculates the weight of each acquisition node based on the hardware configuration information of each acquisition node in at least two acquisition nodes, determines the acquisition range of each acquisition node on the hash ring according to the weight, and distributes the acquisition nodes for at least one cluster according to the acquisition range of each acquisition node on the hash ring. The weight represents probability that the cluster is distributed to the acquisition node, and the acquisition node is used for acquiring data of the corresponding cluster. According to the application, the acquisition range of the acquisition node on the hash ring is determined through the weight of the acquisition node, the acquisition node is distributed for the cluster according to the acquisition range, so that dynamic load adjustment of the acquisition node can be realized, and load balance of all the acquisition nodes is realized as a whole. The application can fully utilize the resources of the acquisition nodes, flexibly adjust the load of each acquisition node according to the weight, and avoid the condition that a single acquisition node is down due to overhigh load.
Drawings
FIG. 1 is a schematic diagram of a data acquisition system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another data acquisition system according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another data acquisition system according to an embodiment of the present application;
Fig. 4 is a schematic flow chart of an implementation of a data acquisition method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an implementation flow of another data acquisition method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an implementation flow of another data acquisition method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of another data acquisition system according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a cluster structure according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a data acquisition device according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Prometaus is an open source monitoring system, and related technology uses Prometaus as a cluster data collection and monitoring scheme, and Prometaus externally exposes an HTTP access address for acquiring current monitoring sample data by installing an exporter program in a cluster. Such a program is called an exporter, an instance of which is called a Target. In a broad sense, all programs that can provide monitoring data to Prometaus can be referred to as an exporter. Prometheus periodically retrieves monitoring data from these targets by polling and stores in a database.
With the exponential growth of mobile cloud product users, the above conventional data acquisition and monitoring scheme has the following problems:
1. related technologies need to be provided with an exporter for each cluster to acquire data, the deployment cost of exporter resources is high, the invasiveness of the exporter to the clusters is large, the structure is heavy, the exporter is not easy to update, and the micro-service concept is not met.
2. The scheme that a single exporter collects a single cluster cannot fully utilize exporter resources, so that resource waste is caused. Moreover, the load of each exporter is difficult to manage, load balancing of all the exporters cannot be achieved, and a single exporter is easy to be down due to too high load.
3. The monitoring rule is stiff, the flexibility is poor, the acquisition mode and the acquisition data type cannot be flexibly adjusted according to the requirements, and the mobile cloud variable market requirements cannot be met.
Aiming at the defects of the related technology, the embodiment of the invention provides a data acquisition method which can improve the utilization rate of resources and realize automatic load balancing. In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
Fig. 1 is a schematic structural diagram of a data acquisition system according to an embodiment of the present invention, where the data acquisition system includes: the system comprises a service discovery component, a unified data acquisition component, an etcd database and an acquired service cluster.
Wherein the collected service cluster comprises a plurality of Redis clusters, which are only an example, and other types of clusters can be collected service clusters. The unified data acquisition component comprises a plurality of acquisition service copies, wherein the acquisition service is used for acquiring data in the Redis cluster, and then the acquired data is injected into the etcd database.
As shown in fig. 2, the newly built acquisition service and the newly built Redis cluster register with the etcd, the acquisition service continuously switches the monitoring list in the etcd, and data is acquired for the Redis cluster according to the monitoring list. One of the acquisition services corresponds to one of the monitoring lists in etcd.
As shown in fig. 3, the service discovery component continuously watches the dis list and the acquisition service list in etcd to discover new dis clusters or acquisition services in real time. The service discovery component automatically performs load balancing according to the load condition of the acquisition service, and automatically adjusts the dis cluster in the monitoring list corresponding to each acquisition service, specifically referring to the following embodiments.
Fig. 4 is a schematic implementation flow chart of a data collection method according to an embodiment of the present invention, where an execution body of the data collection method is an electronic device, and the electronic device includes a desktop computer, a notebook computer, a server, and the like. In the embodiment of fig. 1, the execution body of the data acquisition method is the service discovery component in fig. 1. Referring to fig. 4, the data acquisition method includes:
S401, calculating the weight of each acquisition node based on hardware configuration information of each acquisition node in at least two acquisition nodes; the weight represents the probability of the cluster being distributed to the acquisition node; the collection node is used for collecting data of a corresponding cluster.
Here, the collection node corresponds to the collection service in the above embodiment, and the collection node may be a virtual node, for example, the collection node may be a pod in a k8s (Kubernetes) cluster. The collection node is used for collecting data of clusters, such as Redis clusters, and one collection node can collect data of a plurality of Redis clusters at the same time.
In this embodiment, the cluster allocates a hardware resource to each collection node, and the hardware configuration information of the collection node may include: the number of processor cores and the memory capacity of the node are collected. The more hardware resources (e.g., more memory capacity, more processor cores) that an acquisition node allocates, the more clusters the acquisition node can acquire.
In this embodiment, the weight characterizes the probability that a cluster is assigned to an acquisition node, representing the probability that data of the cluster is acquired by that acquisition node. The greater the weight of an acquisition node when the cluster is assigned to the acquisition node, the greater the likelihood that the acquisition node is assigned to the cluster. After the whole cluster allocation is finished, the more clusters are allocated to the acquisition node.
Calculating the weight of each acquisition node based on the hardware configuration information of each acquisition node, when the hardware configuration information is the processor core number and the memory capacity of the acquisition node, correspondingly, calculating the weight of each acquisition node based on the hardware configuration information of each acquisition node in at least two acquisition nodes, including:
and respectively calculating the weight of each acquisition node based on the processor core number and the memory capacity of each acquisition node.
For example, assume a set of acquisition node bs i Where i= {1,2,..n }, the cpu core number of each acquisition node is C i Memory capacity M i The weight W of each acquisition node can be calculated according to the following formula i
Where ε is an infinitely small positive number, e.g., ε may be set to 10 -6
Of course, the hardware configuration information may also include more information, such as the hard disk capacity, where selecting the number of processor cores and the memory capacity is only one example.
S402, determining the acquisition range of each acquisition node on the hash ring according to the weight.
The consistent Hash algorithm is an improved version of the common Hash algorithm, and the Hash function calculation method is kept unchanged, but the linear Hash space of the common Hash algorithm is replaced by constructing a circular Hash space. Conventional consistent hashing algorithms hash the acquisition node's internet protocol (IP, internet Protocol) address or host name by hashing the acquisition node's address and then determining the position of the acquisition node on the hash ring based on the hash value. In the traditional method, the position of the acquisition node on the hash ring is fixed, the acquisition range of the acquisition node is equal to the arc length between the acquisition node and the last acquisition node in the hash ring, and if the acquisition node is newly added, the related technology can only be inserted between two acquisition nodes, so that the load of the original acquisition node (the next acquisition node of the newly added acquisition node in the hash ring) is influenced, and the load of other acquisition nodes cannot be changed. The traditional method is difficult to adjust the load of each acquisition node, the load of all the acquisition nodes cannot be integrally adjusted, and dynamic load adjustment cannot be performed according to the resource size of the acquisition nodes.
Consistent hashing forms the whole hash value space into a virtual ring, i.e., a hash ring. In this embodiment, the acquisition range of each acquisition node on the hash ring is determined according to the weight, and a hash ring is first constructed, for example, a hash function is assumed, where the hash value space is (0 to 2 32 -1). That is, the hash value is a 32-bit integer number, the concept of a virtual one-ring,constructing a ring with a length of 0 to 2 32 -1 hash value space.
In the embodiment, the acquisition range of each acquisition node on the hash ring is determined through the weight, the weight calculated in the embodiment is a percentage smaller than 1, the sum of the weights of all the acquisition nodes is equal to 1, the hash ring is divided according to the weight, and the proportion of each acquisition node to the hash ring is equal to the weight.
In an embodiment, the determining the collection range of each collection node on the hash ring according to the weight includes:
determining the length of an arc line of the corresponding acquisition node on the hash ring according to the weight; the arc represents the acquisition range of the corresponding acquisition node on the hash ring; wherein the sum of the weights of all the at least two collection nodes is equal to 1.
The hash ring is a circular arc on which hash values all fall. Dividing the hash ring according to the weight of each acquisition node, wherein each acquisition node corresponds to one section of arc line on the hash ring, the ratio of the length of the arc line to the perimeter of the hash ring is equal to the weight, and the sum of the lengths of the arc lines corresponding to all the acquisition nodes is equal to the perimeter of the hash ring.
The arc line is the collection range of the collection node, the larger the weight value is, the longer the arc line of the collection node is, and the larger the collection range is. Thus, when the cluster determines the collection node through the hash ring, the longer the arc the greater the probability that the collection node is assigned to the cluster.
S403, distributing the collection nodes for at least one cluster according to the collection range of each collection node on the hash ring.
Because the hash ring corresponds to a hash value space, each hash value corresponds to a point on the hash ring, hash calculation is performed on at least one cluster through a hash function, the hash value corresponding to each cluster is obtained through calculation, and then the cluster is mapped to the hash ring through the hash value. The cluster is mapped in the collection range of which collection node on the hash ring, and the collection node is responsible for collecting data of the cluster.
Because the weight can represent the resource size of the acquisition nodes, if the number of the acquisition nodes is adjusted or the number of the clusters is adjusted, the application can dynamically adjust the load of the acquisition nodes according to the weight in real time to realize load balancing.
Referring to fig. 5, in an embodiment, the allocating the collection node to at least one cluster according to the collection range of each collection node on the hash ring includes:
s501, calculating hash values corresponding to all clusters, and determining the positions of all clusters on the hash ring according to the hash values.
And calculating a hash value corresponding to each cluster through a preset hash algorithm, and then mapping the clusters to the hash ring through the hash value.
The hash function used is set by a developer according to the actual environment, for example, the hash algorithm may be a Message digest algorithm (MD 5 Message-digest algoritm), a cyclic redundancy check (CRC, cyclic Redundancy Check), a MurmurHash, or the like.
S502, the positions of the clusters on the hash ring are respectively matched with the acquisition ranges of the acquisition nodes, and the acquisition nodes are distributed for the clusters according to the matching result.
If the matching result represents that the position of the cluster on the hash ring is positioned in the acquisition range of the first acquisition node, the cluster is distributed to the first acquisition node; the first acquisition node is any one of the at least two acquisition nodes; the first collection node is used for collecting data of the cluster.
The cluster is mapped in the acquisition range of which acquisition node on the hash ring, the acquisition node is responsible for carrying out data acquisition on the cluster, and the cluster is written into a monitoring list corresponding to the acquisition node.
For example, there are 100 Redis clusters to allocate acquisition nodes, and there are a total of 5 acquisition nodes (A, B, C, D, E), and the weights of the 5 acquisition nodes are 10%, 20%, 30%, 15%, and 25%, respectively. Then, when 100 dis clusters are allocated, it is possible that acquisition node a may be allocated to 10 dis clusters, acquisition node B may be allocated to 20 dis clusters, acquisition node C may be allocated to 30 dis clusters, acquisition node D may be allocated to 15 dis clusters, and acquisition node E may be allocated to 25 dis clusters. Which Redis cluster is allocated to which acquisition node is determined by a hash function and a hash ring.
The embodiment calculates the weight of each acquisition node based on the hardware configuration information of each acquisition node in at least two acquisition nodes, determines the acquisition range of each acquisition node on the hash ring according to the weight, and distributes the acquisition nodes for at least one cluster according to the acquisition range of each acquisition node on the hash ring. The weight represents probability that the cluster is distributed to the acquisition node, and the acquisition node is used for acquiring data of the corresponding cluster. According to the application, the acquisition range of the acquisition node on the hash ring is determined through the weight of the acquisition node, the acquisition node is distributed for the cluster according to the acquisition range, so that dynamic load adjustment of the acquisition node can be realized, and load balance of all the acquisition nodes is realized as a whole. The application can fully utilize the resources of the acquisition nodes, flexibly adjust the load of each acquisition node according to the weight, and avoid the condition that a single acquisition node is down due to overhigh load.
Referring to fig. 6, in an embodiment, the method further comprises:
s601, under the condition that the load value of the second acquisition node is monitored to be larger than a set value, creating a third acquisition node; the second collection node is any one of the at least two collection nodes.
Here, the processor usage amount and the memory usage amount of the second collection node may be used as the load value, for example, the set value is 0.9, and when the processor usage amount of the second collection node reaches 90%, it is determined that the load value of the second collection node is greater than the set value, which indicates that the load of the second collection node is too high.
Under the condition that the load of the second acquisition node is too high, a new acquisition node is created, and the newly-built acquisition node is named as a third acquisition node.
S602, calculating weights of the at least two acquisition nodes and the third acquisition node.
S603, determining the acquisition ranges of the at least two acquisition nodes and the third acquisition node on the hash ring according to the weight.
S604, reassigning the collection nodes for the at least one cluster according to the collection ranges of the at least two collection nodes and the third collection node on the hash ring.
After a third acquisition node is newly established, the weight of all the current acquisition nodes is recalculated, the acquisition range of each acquisition node on the hash ring is determined again according to the weight, and then the acquisition nodes are allocated to all the current clusters again according to the acquisition range.
The acquisition nodes are required to be redistributed because the original acquisition nodes reach the load limit, and the newly built acquisition nodes are used for sharing the pressure of the original acquisition nodes. For example, a total of 100 clusters of data are acquired by 5 acquisition nodes before the load of the second acquisition node reaches a threshold. After the load of the second acquisition node reaches the threshold value, an acquisition node is newly established, and 100 clusters of data are acquired through 6 acquisition nodes, so that the load of the original 5 acquisition nodes is reduced, and downtime caused by overhigh load is avoided.
All clusters corresponding to all original acquisition nodes are required to be reassigned to the acquisition nodes, but only clusters corresponding to the second acquisition node are required to be assigned to the third acquisition node, because the clusters are uniformly distributed on the hash ring according to the hash function when the acquisition nodes are assigned, and if one acquisition node has the condition of overhigh load, the loads of other acquisition nodes are higher at the same time. Therefore, after the acquisition nodes are newly established, all the clusters corresponding to all the original acquisition nodes are redistributed to the acquisition nodes, so that the load of the whole acquisition nodes can be reduced, rather than only the load of one node is improved, and the load balance of the whole acquisition nodes is realized.
The application can automatically find the newly created cluster, and map the newly created cluster to the hash ring by calculating the hash value, so that the cluster is automatically added into a monitored list corresponding to the acquisition service. The application can realize automatic load balancing and collecting node expansion and contraction, automatically expand new collecting nodes under the condition of overhigh collecting node load, and re-load balancing. On the other hand, under the condition that the load of the acquisition nodes is too small, the number of the acquisition nodes can be reduced, and the full utilization of resources is realized.
In practical application, the cluster deployment environment can be K8S, a newly created cluster can be automatically found by a watch technology based on the K8S, and the cluster is automatically added into a monitored list for monitoring; and automatically discovering the newly added acquisition service, and adding the acquisition service into an acquisition service list.
In an embodiment, at least one of the at least two collection nodes is deleted in case the load value of the fifth collection node is smaller than the threshold value. The number of acquisition nodes to be deleted can be determined according to the range of the load value of the fifth acquisition node.
For example, the threshold is set to 0.3, and if the load value of the fifth acquisition node is less than 30%, one acquisition node is deleted. And if the load value of the fifth acquisition node is less than 20%, deleting 2 acquisition nodes. If the load value of the fifth acquisition node is less than 10%, deleting 3 acquisition nodes.
In an embodiment, after assigning the collection nodes to at least one cluster according to the collection ranges of the respective collection nodes on the hash ring, the method further comprises:
determining the number of clusters acquired by a fourth acquisition node in batches each time according to hardware configuration information of the fourth acquisition node and the total number of the distributed clusters; the fourth collection node collects the total number of the distributed clusters in batches; the fourth collection node is any one of the at least two collection nodes.
Each acquisition node corresponds to a monitoring list, and cluster information required to be acquired by the acquisition node is recorded in the monitoring list. For example, the total number of clusters to which the fourth collection node is assigned is 100, and if the fourth collection node performs data collection on 100 clusters at the same time, the load of the fourth collection node may be very high, and may be down. The batch collection is performed on 100 clusters, for example, the batch collection number is set to be 5, and the number of clusters collected per batch is 20. The fourth collection node collects data of 20 clusters each time, and sequentially collects data of 100 clusters. It should be appreciated that the cluster acquired by the fourth acquisition node for each round is not duplicated.
The embodiment can determine the number of clusters collected by the fourth collection node in batches each time according to the hardware configuration information of the fourth collection node and the total number of the clusters allocated. For example, assume that the acquisition time interval is set to T in seconds; the processor Core number of the fourth acquisition node is C, and the unit is Core; the memory of the fourth acquisition node is M, and the unit is G. The total number of clusters allocated and collected by the fourth collection node is S, and the unit is one. The number of batch acquisitions T of the fourth acquisition node during the time interval T can be calculated by the following formula mean Cluster number S for each batch acquisition mean
Smean=1 if too few total clusters S allocated for acquisition result in Smean being less than 1.
According to the embodiment, the clusters in the monitoring list are collected in batches, so that the situations of delay and excessively high instantaneous load caused by simultaneous collection of a large amount of data can be avoided.
In fig. 1, when the information from the service discovery component watch to etcd changes, the collected service clusters are distributed to the corresponding collection services for data collection through the improved consistency hash ring of the application. As shown in fig. 7, when the service discovery component discovers that the dis list changes, the dis list changes to indicate that a new dis cluster is added, the service discovery component calculates a hash value of the new dis cluster through a consistent hash algorithm, maps the hash value onto a hash ring, and thus allocates a monitoring node for the new dis cluster, the monitoring node refers to an acquisition service, and then updates the dis list that the acquisition service is monitoring.
When the service discovery component discovers that the monitoring list is changed, the monitoring list is changed to indicate that the acquisition service is newly added or reduced, and the load of all the acquisition services is adjusted through a consistent hash algorithm, so that load balancing is realized. The application can automatically calculate the weight through the processor and the memory to represent the processing performance of the acquisition node, so that the acquisition node can calculate the optimal scheduling number.
In one embodiment, the number of virtual nodes per collection node, which refers to the maximum number of clusters that a collection node can collect, may be calculated by the following formula.
X=n×virtual node multiple
Wherein X is the total number of virtual nodes of the consistent hash ring, C is the total processor core number, M is the total memory capacity, and N is the total number of acquisition nodes.
In the related art, the virtual node multiple is often preset according to experience, and cannot be flexibly changed according to the change of the environment.
After the weight of each collection node is calculated by the above embodiment, the number N of virtual nodes of each collection node may be calculated according to the weight and the total number of virtual nodes i
N i =M×W i
The virtual node number indicates the maximum number of clusters that the collection node can collect, and according to the number of virtual nodes of each collection node, the virtual node number can be used to determine the load of the current collection node, for example, when the number of clusters that the collection node is currently collecting is close to the number of virtual nodes, the load of the collection node can be determined to be too high.
Referring to fig. 8, fig. 8 is a schematic diagram of a cluster structure according to an embodiment of the present invention. As can be seen from fig. 8, the acquisition service and the Reids clusters are independent of each other, so that complete decoupling of the clusters and the acquisition service is realized, the acquisition service and the Reids clusters are increased or reduced, the acquisition service and the Reids clusters do not affect each other, and a many-to-many relationship can be achieved, that is, a plurality of acquisition services correspond to a plurality of Reids clusters.
In practical application, the cluster provides an open code function and a configuration file, for example, service, getsingleinstan eInfo (value) is a development function provided by the cluster, and a sender can customize a specific collection mode according to access protocols of different products through the development function, and meanwhile uses a go lightweight protocol to collect, so that resources are utilized to the greatest extent. Different products can be replaced according to the modified code function, and meanwhile, the configuration file is modified at any time according to the requirement to flexibly modify the monitoring items and the content.
Different from the traditional data acquisition mode that each cluster needs to be allocated with one exporter, the application designs a data acquisition mode for large-scale clusters, which can realize dynamic discovery of clusters and acquisition services, and the acquisition services and the clusters are fully decoupled, so that the data acquisition method is more portable and flexible. And the automatic load balancing of the acquisition nodes and the automatic transverse expansion of the acquisition nodes are realized, manual intervention is not needed, meanwhile, batch acquisition is carried out on cluster services of a monitoring list, the situation that time delay and excessive instantaneous load are caused by simultaneous acquisition of a large amount of data is avoided, and the best acquisition performance and full utilization of resources are realized. And open code functions and configuration files are provided, so that monitoring expansion and product adaptation can be more conveniently carried out.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The technical schemes described in the embodiments of the present invention may be arbitrarily combined without any collision.
In addition, in the embodiments of the present invention, "first", "second", etc. are used to distinguish similar objects and are not necessarily used to describe a particular order or precedence.
Referring to fig. 9, fig. 9 is a schematic diagram of a data acquisition device according to an embodiment of the present invention, as shown in fig. 9, the device includes:
the computing module is used for computing the weight of each acquisition node based on the hardware configuration information of each acquisition node in the at least two acquisition nodes; the weight represents the probability of the cluster being distributed to the acquisition node; the acquisition node is used for acquiring data of a corresponding cluster;
the determining module is used for determining the acquisition range of each acquisition node on the hash ring according to the weight;
and the distribution module is used for distributing the collection nodes for at least one cluster according to the collection range of each collection node on the hash ring.
In an embodiment, the allocation module allocates the collection node to at least one cluster according to the collection range of each collection node on the hash ring, including:
calculating hash values corresponding to all clusters, and determining the positions of all clusters on the hash ring according to the hash values;
And matching the positions of the clusters on the hash ring with the acquisition ranges of the acquisition nodes respectively, and distributing the acquisition nodes for the clusters according to the matching result.
In an embodiment, the allocating module allocates the collection node to each cluster according to the matching result, including:
if the matching result represents that the position of the cluster on the hash ring is positioned in the acquisition range of the first acquisition node, the cluster is distributed to the first acquisition node; the first acquisition node is any one of the at least two acquisition nodes; the first collection node is used for collecting data of the cluster.
In an embodiment, the determining module determines, according to the weight, an acquisition range of each acquisition node on the hash ring, including:
determining the length of an arc line of the corresponding acquisition node on the hash ring according to the weight; the arc represents the acquisition range of the corresponding acquisition node on the hash ring; wherein the sum of the weights of all the at least two collection nodes is equal to 1.
In an embodiment, the device further comprises:
the creating module is used for creating a third acquisition node under the condition that the load value of the second acquisition node is monitored to be larger than a set value; the second acquisition node is any one of the at least two acquisition nodes;
The weight calculation module is used for calculating weights of the at least two acquisition nodes and the third acquisition node;
determining the acquisition ranges of the at least two acquisition nodes and the third acquisition node on the hash ring according to the weight;
the distribution module is used for redistributing the collection nodes for the at least one cluster according to the collection ranges of the at least two collection nodes and the third collection node on the hash ring.
In an embodiment, the device further comprises:
the batch acquisition module is used for determining the number of clusters acquired by the fourth acquisition node in batches each time according to the hardware configuration information of the fourth acquisition node and the total number of the allocated clusters; the fourth collection node collects the total number of the distributed clusters in batches; the fourth collection node is any one of the at least two collection nodes.
In an embodiment, the hardware configuration information includes: collecting the number of processor cores and the memory capacity of the node;
correspondingly, the calculating module calculates the weight of each collection node based on the hardware configuration information of each collection node in at least two collection nodes, and the calculating module comprises:
And respectively calculating the weight of each acquisition node based on the processor core number and the memory capacity of each acquisition node.
In practice, the calculation module, the allocation module and the determination module may be implemented by a processor in the electronic device, such as a central processing unit (CPU, central ProcessingUnit), a digital signal processor (DSP, digital Signal Processor), a micro control unit (MCU, microcontroller Unit) or a programmable gate array (FPGA, field-Programmable Gate Array), etc.
It should be noted that: in the data acquisition device provided in the above embodiment, only the division of the above modules is used for illustration, and in practical application, the above processing allocation may be performed by different modules according to needs, that is, the internal structure of the device is divided into different modules, so as to complete all or part of the above processing. In addition, the data acquisition device and the data acquisition method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
The data acquisition device can be in the form of an image file, and the image file can be operated in the form of a container or a virtual machine after being executed so as to realize the data acquisition method. Of course, the method is not limited to the image file form, and some software forms capable of implementing the data acquisition method of the present application are within the scope of the present application, for example, software modules implemented in a hypervisor (virtual machine monitor) in a cloud computing platform may also be used.
The embodiment of the application also provides an electronic device based on the hardware implementation of the program module, and in order to implement the method of the embodiment of the application, the data acquisition method is implemented by a processor of the electronic device. Fig. 10 is a schematic diagram of a hardware composition structure of an electronic device according to an embodiment of the present application, as shown in fig. 10, the electronic device includes:
a communication interface capable of information interaction with other devices such as a network device and the like;
and the processor is connected with the communication interface so as to realize information interaction with other equipment and is used for executing the method provided by one or more technical schemes on the electronic equipment side when the computer program is run. And the computer program is stored on the memory.
Of course, in practice, the various components in the electronic device are coupled together by a bus system. It will be appreciated that a bus system is used to enable connected communications between these components. The bus system includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus systems in fig. 10.
In the application, the electronic device can be a single hardware device or a cluster formed by a plurality of hardware devices, such as a cloud computing platform. A cloud computing platform is a cluster device that organizes a plurality of independent server physical hardware resources into pooled resources, and provides the required virtual resources and services externally.
The memory in the embodiments of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.
It will be appreciated that the memory can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Wherein the nonvolatile Memory may be Read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read Only Memory (EEPROM, electrically Erasable Programmable Read-Only Memory), magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk Read Only Memory (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile Memory may be a random access Memory (RAM, randomAccess Memory) that acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static RandomAccess Memory), synchronous static random access memory (SSRAM, synchronous Static RandomAccess Memory), dynamic random access memory (DRAM, dynamic RandomAccess Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic RandomAccess Memory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data Rate Synchronous Dynamic RandomAccess Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic RandomAccess Memory), synchronous link dynamic random access memory (SLDRAM, syncLinkDynamic RandomAccess Memory), direct memory bus random access memory (DRRAM, direct Rambus RandomAccess Memory). The memory described by embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
The embodiment of the invention also provides a cloud computing platform which comprises a data processing software module for the data acquisition method, wherein the data processing software module is used for realizing the steps of the data acquisition method provided by the embodiment of the invention.
The cloud computing platform is a service form for organizing a plurality of independent server physical hardware resources into pooled resources by adopting the technologies of computing virtualization, network virtualization and storage virtualization, is a structure for defining resources based on the development of the virtualization technology, and can provide resource capacity of forms such as virtual machines, containers and the like. The method and the system have the characteristics of flexibility, elasticity, distribution, multiple tenants, on demand and the like, and are a novel IT (information technology) and software delivery mode by eliminating the fixed relation between hardware and an operating system, relying on the communication uniform resource scheduling of a network and then providing needed virtual resources and services.
Current cloud computing platforms support several service modes:
SaaS (Software as a Service ): the cloud computing platform user does not need to purchase the software, but rents the software deployed on the cloud computing platform instead, the user does not need to maintain the software, and the software service provider can manage and maintain the software in full right;
PaaS (Platform as a Service ): a cloud computing platform user (typically a software developer at this time) may build new applications on the architecture provided by the cloud computing platform or extend existing applications without having to purchase development, quality control, or production servers;
IaaS (Infrastructure as a Service ): the cloud computing platform provides data centers, infrastructure hardware and software resources through the internet, and the cloud computing platform in the IaaS mode can provide servers, operating systems, disk storage, databases and/or information resources.
The method disclosed by the embodiment of the application can be applied to a processor or realized by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium having a memory, and the processor reads the program in the memory and performs the steps of the method in combination with its hardware.
Optionally, when the processor executes the program, a corresponding flow implemented by the electronic device in each method of the embodiment of the present application is implemented, and for brevity, will not be described herein.
In an exemplary embodiment, the present application also provides a storage medium, i.e. a computer storage medium, in particular a computer readable storage medium, for example comprising a first memory storing a computer program, which is executable by a processor of an electronic device to perform the steps of the aforementioned method. The computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device, and method may be implemented in other manners. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The technical schemes described in the embodiments of the present application may be arbitrarily combined without any collision.
In addition, in the present examples, "first," "second," etc. are used to distinguish similar objects and not necessarily to describe a particular order or sequence.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of data acquisition, the method comprising:
calculating the weight of each acquisition node based on the hardware configuration information of each acquisition node in the at least two acquisition nodes; the weight represents the probability of the cluster being distributed to the acquisition node; the acquisition node is used for acquiring data of a corresponding cluster;
determining the acquisition range of each acquisition node on the hash ring according to the weight;
and distributing the acquisition nodes for at least one cluster according to the acquisition range of each acquisition node on the hash ring.
2. The method of claim 1, wherein the assigning the collection node to the at least one cluster based on the collection range of each collection node on the hash ring comprises:
calculating hash values corresponding to all clusters, and determining the positions of all clusters on the hash ring according to the hash values;
and matching the positions of the clusters on the hash ring with the acquisition ranges of the acquisition nodes respectively, and distributing the acquisition nodes for the clusters according to the matching result.
3. The method according to claim 2, wherein the assigning the collection node to each cluster according to the matching result comprises:
If the matching result represents that the position of the cluster on the hash ring is positioned in the acquisition range of the first acquisition node, the cluster is distributed to the first acquisition node; the first acquisition node is any one of the at least two acquisition nodes; the first collection node is used for collecting data of the cluster.
4. The method of claim 1, wherein determining the collection range of each collection node on the hash ring according to the weight comprises:
determining the length of an arc line of the corresponding acquisition node on the hash ring according to the weight; the arc represents the acquisition range of the corresponding acquisition node on the hash ring; wherein the sum of the weights of all the at least two collection nodes is equal to 1.
5. The method according to claim 1, wherein the method further comprises:
under the condition that the load value of the second acquisition node is monitored to be larger than a set value, a third acquisition node is established; the second acquisition node is any one of the at least two acquisition nodes;
calculating weights of the at least two acquisition nodes and the third acquisition node;
Determining the acquisition ranges of the at least two acquisition nodes and the third acquisition node on the hash ring according to the weight;
and reallocating the acquisition nodes for the at least one cluster according to the acquisition ranges of the at least two acquisition nodes and the third acquisition node on the hash ring.
6. The method of claim 1, wherein after assigning the collection nodes to at least one cluster based on the collection ranges of the respective collection nodes on the hash ring, the method further comprises:
determining the number of clusters acquired by a fourth acquisition node in batches each time according to hardware configuration information of the fourth acquisition node and the total number of the distributed clusters; the fourth collection node collects the total number of the distributed clusters in batches; the fourth collection node is any one of the at least two collection nodes.
7. The method of claim 1, wherein the hardware configuration information comprises: collecting the number of processor cores and the memory capacity of the node;
correspondingly, the calculating the weight of each collection node based on the hardware configuration information of each collection node in the at least two collection nodes comprises the following steps:
And respectively calculating the weight of each acquisition node based on the processor core number and the memory capacity of each acquisition node.
8. A data acquisition device, comprising:
the computing module is used for computing the weight of each acquisition node based on the hardware configuration information of each acquisition node in the at least two acquisition nodes; the weight represents the probability of the cluster being distributed to the acquisition node; the acquisition node is used for acquiring data of a corresponding cluster;
the determining module is used for determining the acquisition range of each acquisition node on the hash ring according to the weight;
and the distribution module is used for distributing the collection nodes for at least one cluster according to the collection range of each collection node on the hash ring.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the data acquisition method according to any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the data acquisition method according to any one of claims 1 to 7.
CN202210992064.9A 2022-08-17 2022-08-17 Data acquisition method and device, electronic equipment and storage medium Pending CN116800739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210992064.9A CN116800739A (en) 2022-08-17 2022-08-17 Data acquisition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210992064.9A CN116800739A (en) 2022-08-17 2022-08-17 Data acquisition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116800739A true CN116800739A (en) 2023-09-22

Family

ID=88042632

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210992064.9A Pending CN116800739A (en) 2022-08-17 2022-08-17 Data acquisition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116800739A (en)

Similar Documents

Publication Publication Date Title
US10394847B2 (en) Processing data in a distributed database across a plurality of clusters
US10846137B2 (en) Dynamic adjustment of application resources in a distributed computing system
US9971823B2 (en) Dynamic replica failure detection and healing
CN107924338B (en) Optimal storage and workload placement and high resiliency in geographically distributed cluster systems
US8694578B2 (en) Swarm-based synchronization over a network of object stores
US9195494B2 (en) Hashing storage images of a virtual machine
US20240061812A1 (en) Metadata control in a load-balanced distributed storage system
US20160378846A1 (en) Object based storage cluster with multiple selectable data handling policies
CN111639061B (en) Data management method, device, medium and electronic equipment in Redis cluster
US10664278B2 (en) Method and apparatus for hardware acceleration in heterogeneous distributed computing
US10909086B2 (en) File lookup in a distributed file system
US20210049049A1 (en) Optimizing clustered applications in a clustered infrastructure
CN112148693A (en) Data processing method, device and storage medium
CN108200211B (en) Method, node and query server for downloading mirror image files in cluster
US11461053B2 (en) Data storage system with separate interfaces for bulk data ingestion and data access
US11886225B2 (en) Message processing method and apparatus in distributed system
Wang Ying
CN113051102A (en) File backup method, device, system, storage medium and computer equipment
US10931750B1 (en) Selection from dedicated source volume pool for accelerated creation of block data volumes
US10956442B1 (en) Dedicated source volume pool for accelerated creation of block data volumes from object data snapshots
US11940885B2 (en) Parallel restore of a large object
CN116800739A (en) Data acquisition method and device, electronic equipment and storage medium
CN114385596A (en) Data processing method and device
EP2609512A1 (en) Transferring files
CN111382326A (en) Instance group creation method, instance group creation device, instance group creation server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination