CN117061512A

CN117061512A - Distributed information management method and system based on big data

Info

Publication number: CN117061512A
Application number: CN202311010839.9A
Authority: CN
Inventors: 周实奇; 陈雅娟; 陈辉; 黄倚霄
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-11-14

Abstract

The application discloses a distributed information management method and a system based on big data, wherein the method comprises the following steps: obtaining a first main node and a sub node group based on the data cluster according to the priority of the data cluster system; constructing a distributed information model for service distribution according to the first main node and the sub-node group; acquiring a service subscription request sent by a child node group to a first main node based on a distributed information model; and feeding corresponding service data back to a first sub-node in the sub-node group according to the service subscription request, and synchronously transmitting the service data to the first other sub-nodes through the distributed information model so as to obtain an information synchronous result of the data cluster and a synchronous transmission result of the service data. The application solves the problem of how to realize information synchronization of the data clusters in a high availability state, in particular to the problem of how to realize the data transmission synchronism of each subsystem when the data reconnection occurs when the data transmission link between the main system and the subsystem fails.

Description

Distributed information management method and system based on big data

Technical Field

The present application relates to the field of information management technologies, and in particular, to a distributed information management method, system, device, and storage medium based on big data.

Background

In the process of calling and distributing data to a plurality of sub-systems subordinate to the main system, the main system needs to process massive service data, if the network of the main system suddenly fails, the whole system is easily in a paralysis state, and the normal operation of other sub-nodes is influenced. In the prior art, in order to reduce the operation influence of the failure of the main system on other sub-nodes, the data transmission link is monitored by setting a heartbeat packet, and the data is temporarily stored by setting a standby node for the main system, so that when the main system fails, the standby node replaces the main system to carry out data distribution, but the mode greatly increases the management cost of data storage and calling, and after the main system is recovered, data fluctuation easily occurs when the main system is switched back from the standby node, and the consistency of data transmission is affected.

CN202110721416.2 discloses a distributed cluster state information management method, system and electronic device for cross-domain big data platform. The distributed cluster state information management method of the cross-domain big data platform belongs to the field of data processing and comprises the following steps: the method comprises the steps that external open node list information of a headquarter big data platform and each big data platform of a province big data platform included in a cross-domain big data platform is backed up in a Zookeeper service assembly in a cluster of each big data platform, a leader node in the cluster is cached, and each node in the cluster is registered as a client side of the Zookeeper assembly; and registering the provincial large data platform in the headquarter large data platform, and finally completing automatic backup and caching of the information of the clusters in the headquarter large data platform to complete online operation of the provincial large data platform, and synchronously notifying the latest cross-domain information cache to the external open nodes of each online cluster through the RPC, wherein the headquarter large data platform maintains an RPC heartbeat mechanism for the external open node list of the provincial large data platform.

CN201811546587.0 discloses a method and device for managing a big data cluster, which are applied to any cluster node in the big data cluster, wherein the cluster node is provided with a zookeeper, and the method comprises the following steps: the first cluster node elects a temporary master node; when the first cluster node is selected as a temporary master node, determining whether a cluster node exists in a zookeeper; if yes, using a cluster information management cluster node unit in the zookeeper to store and manage cluster service data of the big data cluster; otherwise, creating a cluster node unit in the zookeeper, and using the cluster node unit to store and manage cluster service data of the big data cluster. The method can solve the problem of system paralysis caused by single-point failure of the main node in the cluster system.

CN202010953828.4 discloses a distributed high-availability big data mining task scheduling system, which comprises a data mining scheduling module, a resource server cluster, a service server cluster, a Zookeeper cluster and a user operation end; the resource monitoring module is connected with the resource server cluster; the task queue module is connected with the service server cluster; the service server cluster is connected with the data analysis module through the Zookeeper cluster. The application can optimally select the service server which is most suitable for running the mining task, and finally pushes the task to the selected service server to perform data mining, thereby completing the automatic operation of the whole life cycle of the data mining task and greatly improving the stability of task running and the data mining efficiency.

Aiming at the first published patent, the data of the cluster information in a headquarter big data platform is backed up and stored, so that the latest cross-domain information is cached to an online cluster through an RPC to synchronously inform, and the relationship between a province platform and the headquarter is established through the list information of external open nodes of each big platform, so that the data synchronization is realized; aiming at the second published patent, cluster service data is temporarily stored through a cluster node, and is temporarily managed through a Zookeeper, namely, temporary data backup is carried out, and data transmission is carried out through the temporary backup data when the node fails, but the problem of transmission asynchronism caused by data fluctuation after the recovery of a main node is not solved; aiming at the third published patent, a service server cluster and a data analysis module are connected through a Zookeeper cluster, the most suitable service server is selected through Zookeeper pushing and data mining is executed, and data calling is carried out through a load scheduling method, a polling scheduling method and an inclination scheduling method.

Disclosure of Invention

The present application aims to solve at least one of the technical problems in the related art to some extent.

Therefore, a first object of the present application is to provide a distributed information management method based on big data, which solves the problem of how to realize information synchronization of a data cluster in a high availability state, and mainly realizes data transmission synchronicity of each subsystem when data reconnection occurs when a data transmission link between a main system and the subsystem fails.

A second object of the present application is to propose a distributed information management system of big data.

A third object of the application is to propose a computer device.

A fourth object of the present application is to propose a non-transitory computer readable storage medium.

To achieve the above object, an embodiment of a first aspect of the present application provides a distributed information management method based on big data, including:

obtaining a first main node and a sub node group based on the data cluster according to the priority of the data cluster system;

constructing a distributed information model for service distribution according to the first main node and the sub-node group;

acquiring a service subscription request sent by a child node group to a first main node based on a distributed information model;

and feeding corresponding service data back to a first sub-node in the sub-node group according to the service subscription request, and synchronously transmitting the service data to the first other sub-node through the distributed information model so as to obtain an information synchronous result of the data cluster and a synchronous transmission result of the service data.

The big data-based distributed information management method according to the embodiment of the present application may further have the following additional technical features:

in one embodiment of the application, the first master node is at a highest priority and the group of child nodes is at a next highest priority.

In one embodiment of the present application, after obtaining the information synchronization result of the data cluster and the synchronization transmission result of the service data, the method further includes:

acquiring transmission link fault data of transmission link faults caused by the occurrence of a network terminal of the first main node;

the first main node with the highest priority is subjected to priority reduction based on the transmission link fault data, and a sub node with the highest priority is selected from the sub node group to be used as a second main node;

and distributing data to the service request of the second other child nodes based on the second main node.

In one embodiment of the application, the method further comprises:

obtaining network connection stability monitoring data of connection faults of the first main node and the sub-node group based on a monitoring mechanism;

obtaining preemptive master node requests sent by the child node groups to the distributed information model respectively based on the network connection stability monitoring data;

and adjusting the child node which is successfully preempted to the highest priority based on the preemptive master node request to serve as a third master node, and synchronously transmitting service information to third other child nodes.

In one embodiment of the present application, the selecting, as the second master node, the child node with the highest priority from the child node group includes:

local optimization is carried out on the sub-node group through a plurality of types of artificial bees in the artificial bee colony algorithm to obtain a local optimization result;

and summarizing the local optimizing result to the optimizing result of the whole bee colony, comparing the optimizing result, and taking the child node with the highest priority obtained according to the comparing result as a second main node.

In one embodiment of the present application, the constructing a distributed information model for service distribution according to the first master node and the child node group includes:

acquiring a connection relationship between a first main node and a child node group;

and carrying out data training on the connection relation and the transaction ID of the corresponding sub-node group through the Zookeeper so as to obtain a distributed information model of service distribution according to a data training result.

In one embodiment of the present application, feeding back corresponding service data to a first child node in a child node group according to the service subscription request, and synchronously transmitting the service data to a first other child node through a distributed information model, including:

searching corresponding user history bill information in the first main node according to the inquired user history bill in the service subscription request;

transmitting user history bill information corresponding to the transaction ID of the sub-node group to a corresponding first sub-node in the sub-node group through a distributed information model; the method comprises the steps of,

and synchronously transmitting the corresponding user history bill information to the first other child nodes in the child node group through the distributed information model.

In one embodiment of the application, data cluster-based prioritization is performed by chronological order of registration times.

In one embodiment of the present application, the obtaining network connection stability monitoring data of a connection failure between a first master node and a child node group based on a listening mechanism includes:

binding corresponding monitoring time on each child node in the child node group based on the transaction ID of the child node group to obtain a monitoring time binding result;

and monitoring the change of the child node data and the change of the child node state based on the monitoring time binding result, and obtaining network connection stability monitoring data of whether the network transmission link between the first main node and the child node group is abnormal or not according to the change monitoring result.

In one embodiment of the present application, the adjusting the child node with the highest preemption success to the child node with the highest priority based on the preemption master node request as the third master node includes:

according to the request of the preemptive master node, and according to the number of the stored user data of the sub-node group, carrying out weight calculation on each sub-node to obtain a weight calculation result;

and selecting a child node with the optimal weight in the child node group as a third main node according to the weight calculation result.

To achieve the above object, a second aspect of the present application provides a distributed information management system based on big data, including:

the node hierarchical dividing module is used for obtaining a first main node and a sub node group based on the data cluster according to the priority of the data cluster system;

the information model construction module is used for constructing a distributed information model for service distribution according to the first main node and the sub-node group;

the service request acquisition module is used for acquiring a service subscription request sent by the child node group to the first main node based on the distributed information model;

and the data synchronous transmission module is used for feeding corresponding service data back to a first sub-node in the sub-node group according to the service subscription request, and synchronously transmitting the service data to a first other sub-node through the distributed information model so as to obtain an information synchronous result of the data cluster and a synchronous transmission result of the service data.

The distributed information management method and the system based on big data realize the data synchronization among all the sub-nodes, so that the service distribution can be performed in time through the distributed information model when the main node has a connection fault, thereby avoiding the influence of the main node fault on the information delay or the asynchronization of the whole system and improving the data synchronism under the data cluster.

To achieve the above object, an embodiment of a third aspect of the present application provides a computer apparatus, including: a processor and a memory; wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the big data based distributed information management method as described in the embodiment of the first aspect.

To achieve the above object, a fourth aspect of the present application provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the big data based distributed information management method according to the first aspect of the embodiment.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a big data based distributed information management method according to an embodiment of the present application;

fig. 2 is a flow chart of data processing when a network terminal occurs in a master node to cause a transmission link to fail according to an embodiment of the present application;

FIG. 3 is a flow chart of data processing when a connection failure occurs at a master node according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a distributed information management system based on big data according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

The following describes a big data based distributed information management method, system, computer device and storage medium according to an embodiment of the present application with reference to the accompanying drawings.

Example 1:

fig. 1 is a flowchart of a big data based distributed information management method according to an embodiment of the present application.

As shown in fig. 1, the method includes, but is not limited to, the steps of:

s1, obtaining a first main node and a sub node group based on the data cluster according to the priority of the data cluster system.

Illustratively, the first master node is at the highest priority and the group of child nodes is at a next-to-highest priority.

For example, prioritization based on data clusters may be performed by the chronological order of registration times.

Specifically, according to the priority of the system, a main node with the optimal priority and a plurality of sub-nodes with priority are selected from the data cluster (in this embodiment, the priority is divided by the registration time).

S2, constructing a distributed information model for service distribution according to the first main node and the sub-node group.

By way of example, a distributed information model for service distribution can be built in the data cluster through a Zookeeper, and the master node synchronously transmits data to a plurality of child nodes through the distributed information model, so that information synchronization of the data cluster is realized.

In one embodiment of the application, the connection relationship between the main node and the sub node and the corresponding transaction ID are subjected to data training through the Zookeeper, so that a distributed information model is obtained.

And S3, acquiring a service subscription request sent by the child node group to the first main node based on the distributed information model.

In one embodiment of the application, the child node sends a service subscription request to the master node through the distributed information model. Therefore, the master node forwards the data according to the sent service request.

And S4, feeding corresponding service data back to a first sub-node in the sub-node group according to the service subscription request, and synchronously transmitting the service data to the first other sub-nodes through the distributed information model so as to obtain an information synchronous result of the data cluster and a synchronous transmission result of the service data.

In one embodiment of the application, the master node feeds back corresponding service data to the corresponding sub-nodes according to the data subscription request, and synchronously transmits the service data to other sub-nodes through the distributed information model so as to realize synchronous transmission of the service data.

Illustratively, searching corresponding user history bill information in the first master node according to the inquired user history bill in the service subscription request; transmitting user history bill information corresponding to the transaction ID of the sub-node group to a corresponding first sub-node in the sub-node group through a distributed information model; and synchronously transmitting the corresponding user history bill information to the first other child nodes in the child node group through the distributed information model.

Specifically, the corresponding service data (such as user history bill information) in the main node is searched according to the service name (such as user history bill inquiry) in the service subscription request, the corresponding service data (user history bill information) is sent to the corresponding sub-node through the distributed information model according to the transaction ID of the sub-node, and the corresponding service data is synchronized to other sub-nodes, so that the other sub-nodes can conveniently perform data backup on the service data (user history bill information) and the service event inquiry at the time, and the possibility of data loss when the main node fails is reduced.

Fig. 2 is a data processing flow chart when a network terminal occurs in a master node and a transmission link fails, as shown in fig. 2:

s201, acquiring transmission link fault data of transmission link faults caused by the occurrence of a network terminal of a first main node;

s202, carrying out priority reduction on a first main node with the highest priority based on transmission link fault data, and selecting a sub-node with the highest priority from a sub-node group as a second main node;

and S203, carrying out data distribution on the service request of the second other child nodes based on the second main node.

Specifically, when the network terminal occurs in the main node and the transmission link fails, the priority of the main node is reduced, and the sub-node with the highest priority is selected from a plurality of sub-nodes to serve as the main node, so that task distribution work is carried out, data distribution is continuously carried out on service request requests of other sub-nodes, and the phenomenon of data asynchronization of other sub-nodes under the condition of network fluctuation is prevented.

For example, the priority evaluation may be performed by the weight value of the child node or the registration time sequence, for example, the child node with the earliest registration time is taken as the master node.

Fig. 3 is a flow chart of data processing when a connection failure occurs in a master node according to an embodiment of the present application, as shown in fig. 3:

s301, obtaining network connection stability monitoring data of connection faults of a first main node and a sub-node group based on a monitoring mechanism;

s302, obtaining preemptive master node requests sent by the child node groups to the distributed information model respectively based on network connection stability monitoring data;

and S303, adjusting the child node which is successfully preempted to the highest priority based on the preemptive master node request to serve as a third master node, and synchronously transmitting service information to third other child nodes.

Specifically, the network connection stability of the master node can be monitored through a monitoring mechanism, when the master node has a connection fault, a plurality of sub-nodes respectively send a request for preempting the master node to the distributed information model, the sub-node which is preempted successfully firstly adjusts the priority of the sub-node to be the optimal level, and other sub-nodes continue to receive service information synchronously sent by the master node as a new master node, so that the phenomenon of integral downtime of the sub-node cluster caused by the fault of the master node is reduced.

Illustratively, for the listening mechanism: binding corresponding monitoring time on each child node through the transaction ID of the child node, and monitoring time such as child node data change, child node state change and the like, so that whether a network transmission link between a master node and the child node is abnormal or not can be timely obtained;

illustratively, for the preempting master node: and according to the request of the preemptive master node, calculating the weight of each sub-node according to the quantity of the stored user data of the sub-node or the proficiency of the processing service, and the like, and selecting the sub-node with the optimal weight as the master node according to the weight calculation result.

Further, according to the big data-based distributed information management method, local optimization can be achieved by carrying out local optimization on the sub-node groups through various worker bees in the artificial bee colony algorithm; and summarizing the local optimizing result to the optimizing result of the whole bee colony, comparing the optimizing result, and taking the child node with the highest priority obtained according to the comparing result as a second main node.

It can be understood that the application realizes the selection of the main node through the artificial bee colony algorithm, performs local optimization on the sub-nodes through various artificial bees, and gathers the local optimization result to the whole bee colony for comparison, thereby obtaining the globally optimal sub-node as the main node, wherein the artificial bee type can be classified by the service type of the sub-node, the self attribute of the sub-node and the like. The main node is rapidly selected when the network fails through the artificial bee colony algorithm, the data convergence speed is high, and the time delay is reduced.

According to the big data-based distributed information management method, a distributed information model is built through a Zookeeper, and business data distribution work is carried out between a main node and a plurality of sub-nodes through the distributed information model. And when the main node fails, a globally optimal sub node can be found out among a plurality of sub nodes through the artificial bee colony algorithm to serve as a new main node, and the synchronous pushing of service data is continuously realized through the new main node, so that the data transmission asynchronous error caused by the main node failure is reduced, the data transmission time delay is reduced, and the data convergence speed when the main node is selected is improved.

Example 2:

in order to implement the above embodiment, as shown in fig. 4, there is further provided a distributed information management system 10 based on big data, the system 10 including: the system comprises a node hierarchical dividing module 100, an information model constructing module 200, a service request acquiring module 300 and a data synchronous transmission module 400;

the node classification module 100 is configured to obtain a first main node and a sub-node group based on a data cluster according to a priority of the data cluster system;

an information model construction module 200, configured to construct a distributed information model for service distribution according to the first master node and the child node group;

the service request acquisition module 300 is configured to acquire a service subscription request sent by a child node group to a first master node based on a distributed information model;

the data synchronization transmission module 400 is configured to feed back corresponding service data to a first child node in the child node group according to the service subscription request, and synchronously transmit the service data to a first other child node through the distributed information model, so as to obtain an information synchronization result of the data cluster and a synchronization transmission result of the service data.

The big data-based distributed information management system provided by the embodiment of the application can build a distributed information model through the Zookeeper, and can carry out business data distribution work between a main node and a plurality of sub-nodes through the distributed information model. And when the main node fails, a globally optimal sub-node among a plurality of sub-nodes can be found out through the artificial bee colony algorithm to serve as a new main node, and the synchronous pushing of service data is continuously realized through the new main node, so that the data transmission asynchronous error caused by the main node failure is reduced.

Example 3:

in order to implement the method of the above embodiment, the present application further provides a computer device, as shown in fig. 5, the computer device 600 includes a memory 601 and a processor 602; wherein the processor 602 runs a program corresponding to executable program code stored in the memory 601 by reading the executable program code for implementing the steps of the above-described method.

In order to implement the above-described embodiments, the present application also proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements a method as described in the previous embodiments.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Claims

1. A distributed information management method based on big data, the method comprising:

2. The method of claim 1, wherein the first primary node is at a highest priority and the group of child nodes is at a next highest priority.

3. The method according to claim 2, wherein after obtaining the result of information synchronization of the data cluster and the result of synchronous transmission of the service data, the method further comprises:

4. A method according to claim 3, characterized in that the method further comprises:

5. A method according to claim 3, wherein selecting the highest priority child node from the group of child nodes as the second master node comprises:

6. The method of claim 4, wherein said constructing a distributed information model of traffic distribution from said first master node and child node group comprises:

7. The method of claim 6, wherein feeding back corresponding service data to a first child node in the group of child nodes according to the service subscription request, and transmitting the service data to the first other child node synchronously through the distributed information model, comprises:

8. The method of claim 1, wherein the prioritizing based on the data clusters is done by a chronological order of registration times.

9. The method of claim 6, wherein the obtaining network connection stability monitoring data for the connection failure of the first master node and the group of child nodes based on the listening mechanism comprises:

10. The method of claim 4, wherein the adjusting the highest priority child node to be the highest priority child node based on the preemptive master node request comprises:

11. A big data based distributed information management system, comprising:

12. A computer device comprising a processor and a memory;

wherein the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for implementing the big data based distributed information management method according to any of claims 1-10.

13. A non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements the big data based distributed information management method according to any of claims 1-10.