CN113965576A

CN113965576A - Container-based big data acquisition method and device, storage medium and equipment

Info

Publication number: CN113965576A
Application number: CN202111402327.8A
Authority: CN
Inventors: 黄立超; 吴红
Original assignee: Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Current assignee: Hunan Happly Sunshine Interactive Entertainment Media Co Ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-01-21
Anticipated expiration: 2041-11-19
Also published as: CN113965576B

Abstract

The application discloses a container-based big data acquisition method, a container-based big data acquisition device, a storage medium and equipment. And constructing a distributed coordination service cluster according to each service container group as a node. And after receiving each service data sent by the buried point component, distributing each service data to each node. And triggering a data collection container in each node, and writing the service data distributed to each node into a storage resource of each node. And triggering a data transmission container in each node, and sending the service data recorded in the storage resource of each node to a preset storage space. By using the scheme of the application, the service data is stored in the storage resource in advance, and then the service data in the storage resource is asynchronously sent to the preset storage space, so that the data collected from the service system can not be lost.

Description

Container-based big data acquisition method and device, storage medium and equipment

Technical Field

The present application relates to the field of big data services, and in particular, to a container-based big data collection method, apparatus, storage medium, and device.

Background

With the rapid development of internet technology, the application of big data services is more and more common. Aiming at the mass data reporting requirement in a big data scene, mass data acquired by a data acquisition service needs to be sent to a preset storage space (such as a message queue and distributed storage) for storage. However, when the internet fails or the preset storage space fails, some data collected from the service system cannot be stably transmitted to the preset storage space, which results in data loss and affects subsequent big data services.

Therefore, how to ensure that the data collected from the business system is not lost when various abnormal conditions occur becomes a problem which needs to be solved urgently in the field.

Disclosure of Invention

The application provides a container-based big data acquisition method, a container-based big data acquisition device, a storage medium and equipment, which are used for ensuring that data acquired from a business system cannot be lost when various abnormal conditions occur.

In order to achieve the above object, the present application provides the following technical solutions:

a container-based big data collection method, comprising:

generating a plurality of service container groups based on the log collection service and the log transmission service; the service container group comprises a data collection container and a data transmission container;

configuring state information of each service container group; the state information includes storage resources;

constructing a distributed coordination service cluster according to each service container group as a node;

after receiving all service data sent by a buried point component preset in a service system, distributing all the service data to all nodes of the distributed coordination service cluster;

triggering a data collection container in each node, and writing the service data distributed to each node into a storage resource of each node;

and triggering a data transmission container in each node, and sending the service data recorded in the storage resource of each node to a preset storage space.

Optionally, the generating a plurality of service container groups based on the log collection service and the log transmission service includes:

packing the log collection service to obtain a data collection container, and packing the log transmission service to obtain a data transmission container;

and arranging the data collection container and the data transmission container to obtain a plurality of service container groups.

Optionally, after configuring the state information of each service container group, the method further includes:

arranging the state information of each service container group to obtain a state copy set;

and generating a configuration file based on the stateful copy set, and packaging the configuration file into a configuration mapping component.

Optionally, after the constructing a distributed coordination service cluster according to each of the service container groups as a node, the method further includes:

under the condition of receiving a service starting instruction input by a user, analyzing the service starting instruction to obtain registration information and system environment configuration of each service container group;

and initializing the registration information and the system environment configuration of each service container group into a node corresponding to each service container group.

and for each node in the distributed coordination service cluster, prompting a user that the service container group corresponding to the node fails when detecting that a preset check service does not receive the heartbeat packet and the check information sent by the node within a preset time period.

Optionally, the method further includes:

under the condition that the preset check service is detected to receive the heartbeat packet and the check information sent by the node within the preset time period, triggering the preset check service to compare the check information with the registration information initialized in advance by the node;

and prompting the user that the service container group corresponding to the node fails under the condition that the verification information is determined to be different from the registration information initialized in advance by the node.

Optionally, the method further includes:

carrying out health detection on a failed service container group, and judging whether the failed service container group is healthy or not;

if the failed service container group is healthy, restarting the failed service container group;

and if the failed service container group is unhealthy, deleting the failed service container group and recreating a new service container group.

A container-based big data collection device comprising:

a container generation unit for generating a plurality of service container groups based on a log collection service and a log transmission service; the service container group comprises a data collection container and a data transmission container;

a state configuration unit, configured to configure state information of each of the service container groups; the state information includes storage resources;

the cluster construction unit is used for constructing a distributed coordination service cluster according to the service container groups as nodes;

the data distribution unit is used for distributing each service data to each node of the distributed coordination service cluster after receiving each service data sent by a buried point component preset in a service system;

the data storage unit is used for triggering a data collection container in each node and writing the service data distributed to each node into the storage resource of each node;

and the data sending unit is used for triggering the data transmission container in each node and sending the service data recorded in the storage resource of each node to a preset storage space.

A computer-readable storage medium comprising a stored program, wherein the program performs the container-based big data collection method.

A container-based big data collection device comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the container-based big data acquisition method during running.

According to the technical scheme, a plurality of service container groups are generated based on log collection service and log transmission service, and state information of each service container group is configured. And constructing a distributed coordination service cluster according to each service container group as a node. After receiving each service data sent by a buried point component preset in a service system, distributing each service data to each node of the distributed coordination service cluster. And triggering a data collection container in each node, and writing the service data distributed to each node into a storage resource of each node. And triggering a data transmission container in each node, and sending the service data recorded in the storage resource of each node to a preset storage space. By using the scheme of the application, the service data is stored in the preset storage resource in advance based on the data collection container and the data transmission container in each service container group, and then the service data in the storage resource is asynchronously sent to the preset storage space, so that the data collected from the service system can be ensured not to be lost when various abnormal conditions occur.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a container-based big data collection method according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of another container-based big data collection method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a container-based big data collection device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, a schematic flow chart of a container-based big data collection method provided in an embodiment of the present application includes the following steps:

s101: and packaging the log collection service to obtain a data collection container, and packaging the log transmission service to obtain a data transmission container.

The log collection service is used for storing data, and the log transmission service is used for sending data.

S102: the data collection container and the data transmission container are arranged to obtain a plurality of service container groups (pod).

The preset container arrangement process can be called, and the data collection container and the data transmission container are arranged to obtain a plurality of service pods. In the embodiment of the present application, the preset container arrangement process includes, but is not limited to, kubernets application.

It should be noted that each service pod includes a data collection container and a data transmission container.

S103: configuring status information of each service pod.

The state information includes, but is not limited to, a sequence number, a storage resource, and a network identifier.

The sequence numbers of the service pods are consecutive, and specifically, assuming that the number of the service pods is 3, which are respectively the first service pod, the second service pod and the third service pod, the sequence number of the first service pod can be set to 1, the sequence number of the second service pod can be set to 2, and the sequence number of the third service pod can be set to 3.

The storage resource may be a storage resource composed of a Persistent storage Volume (PV) and a Persistent storage Volume applicant (PVC). A so-called PV, a piece of network storage configured by an administrator in a cluster, may be considered a resource in the cluster. So-called PVCs, are requests for storage by users, similar to container groups (pods), but pods consume node resources, PVCs consume PV resources.

The network identifier is label information of a device (e.g., a service host) where the service pod is located in the network, and for a specific device, an IP address and a MAC address of the device are identified in the network.

S104: and arranging the state information of each service pod to obtain a state copy set.

The preset stateful application arranging process can be called, the state information of each service pod is arranged, and a stateful copy set is obtained. In the embodiment of the present application, the preset stateful application programming process includes, but is not limited to, being a stateful set controller.

S105: based on the set of stateful copies, a configuration file is generated and encapsulated into a configuration mapping component (ConfigMap).

The ConfigMap is an API object, and is used to store unencrypted data (specifically, a state copy set in this embodiment of the present application) in a key value pair, and may be used as an environment variable, a command line parameter, or a configuration file in a storage volume. Generally, the ConfigMap may decouple the environment variable configuration information from the container, facilitating modification of the application configuration. In addition, the specific implementation of encapsulating the configuration file into the ConfigMap is common knowledge familiar to those skilled in the art, and will not be described herein.

S106: and taking each service pod as a node to construct a distributed coordination service (zookeeper) cluster.

The zookeeper cluster comprises various nodes, the number of the nodes is consistent with the number of the service pods, and the types of the nodes comprise a master node and a slave node.

S107: and under the condition of receiving a service starting instruction input by a user, analyzing the service starting instruction to obtain the registration information and the system environment configuration of each service pod.

S108: and initializing the registration information and the system environment configuration of each service pod into a node corresponding to each service pod.

The postStart callback function may be called to initialize the registration information and system environment configuration of the target service pod to the node corresponding to the target service pod.

S109: after receiving each service data sent by a buried point component preset in a service system, distributing each service data to each node.

The preset load balancer can be called to distribute each service data to each service pod, and the service data received by each service pod is related to the computing resource of the machine to which each service pod belongs. Generally, the more computing resources of a machine to which a service pod belongs, the more business data distributed to the service pod, and the less computing resources of the machine to which the service pod belongs, the less business data distributed to the service pod.

It should be noted that the so-called buried point component is used for collecting service data generated by the service system in real time.

S110: and triggering a data collection container in each node, and writing the service data distributed to each node into a storage resource of each node.

S111: and triggering a data transmission container in each node, and sending the service data recorded in the storage resource of each node to a preset storage space.

S112: for each node in the zookeeper cluster, prompting a user that a service pod corresponding to the node fails when detecting that a heartbeat packet and check information sent by the node are not received in a preset time period by a preset check service.

S113: and under the condition that the preset check service is detected to receive the heartbeat packet and the check information sent by the node within the preset time period, triggering the preset check service to compare the check information with the registration information initialized by the node in advance.

S114: and under the condition that the verification information is determined to be different from the registration information initialized in advance by the node, determining the service pod fault corresponding to the node, and prompting the user of the service pod fault corresponding to the node.

Optionally, in a case that it is determined that the check information is the same as the registration information initialized in advance by the node, it is determined that the service pod corresponding to the node is normal.

For the failed service pod, a preset pod probe process can be called, health detection is carried out on the failed service pod, and whether the failed service pod is healthy or not is judged; if the failed service pod is healthy, restarting the failed service pod; if the failed service pod is unhealthy, the failed service pod is deleted and a new service pod is created anew.

It should be noted that, each time the service pod is restarted, the restarted service pod can still use the storage resource configured by the service pod based on the preset stateful load process.

Generally speaking, when the total number of service pods with faults detected is greater than a preset threshold, it is determined that an abnormal situation occurs, that is, each faulty service pod cannot send service data to a preset storage space, but based on a data collection container in the service pod, the service data is still stored in a storage resource, and after the abnormal situation is eliminated, based on a data transmission container in the service pod, the service resource in the storage resource is further sent to the preset storage space, so that a large amount of loss of the service data is avoided.

Further, in the case where a failure in restart of the service pod is detected, the service pod is deleted and a new service pod is created anew. In the embodiment of the present application, a preStop callback function may be called to delete a service pod (i.e., so-called logoff processing).

When a new service pod is detected to appear in a node, check information sent by the node is required to be acquired from the check service, and whether the service data in the storage resource of the node is sent or not is judged by analyzing the check information; and if the service data in the storage resource of the node is not sent completely, a data transmission container in the new service pod is sent out, and the service data in the storage resource of the node is sent to the preset storage space.

Based on the flow shown in S101-S114, a containerization scheme (i.e., service pod) is used to implement automatic deployment, update, automatic capacity expansion, and automatic failover of the service cluster. In order to ensure the stability of the service data acquisition when various abnormal conditions occur, a scheme of locally storing the service data and then sending the service data to a preset storage space is adopted. And realizing the state management of the service pod by utilizing a zookeeper distributed coordination mechanism and combining with a heartbeat mechanism of the service pod. In addition, regular verification is carried out in the service data transmission process, and the data consistency is ensured by comparing the node state with the node state.

In summary, with the solution shown in this embodiment, based on the data collection container and the data transmission container in each service pod, the service data is stored in the preset storage resource in advance, and then the service data in the storage resource is asynchronously sent to the preset storage space, so that it is ensured that the data collected from the service system is not lost when various abnormal conditions occur.

It should be noted that, in the above embodiment, reference is made to S113, which is an alternative implementation manner of the container-based big data collection method described in this application. In addition, S114 mentioned in the above embodiments is also an optional implementation manner of the container-based big data collection method described in this application. For this reason, the flow mentioned in the above embodiment can be summarized as the method shown in fig. 2.

As shown in fig. 2, a schematic flow chart of another container-based big data collection method provided in the embodiment of the present application includes the following steps:

s201: a plurality of service container groups are generated based on the log collection service and the log transmission service.

The service container group comprises a data collection container and a data transmission container.

S202: configuring state information of each service container group.

Wherein the state information includes storage resources.

S203: and constructing a distributed coordination service cluster according to each service container group as a node.

S204: after receiving each service data sent by a buried point component preset in a service system, distributing each service data to each node of the distributed coordination service cluster.

S205: and triggering a data collection container in each node, and writing the service data distributed to each node into a storage resource of each node.

S206: and triggering a data transmission container in each node, and sending the service data recorded in the storage resource of each node to a preset storage space.

In summary, with the solution shown in this embodiment, based on the data collection container and the data transmission container in each server container group, the service data is stored in the preset storage resource in advance, and then the service data in the storage resource is asynchronously sent to the preset storage space, so that it is ensured that the data collected from the service system is not lost when various abnormal conditions occur.

Corresponding to the container-based big data acquisition method provided by the embodiment of the application, the embodiment of the application also provides a container-based big data acquisition device.

As shown in fig. 3, an architecture diagram of a container-based big data collecting apparatus provided for an embodiment of the present application includes:

a container generating unit 100 for generating a plurality of service container groups based on a log collecting service and a log transmitting service; the service container group includes a data collection container and a data transmission container.

The container generation unit 100 is specifically configured to: packing the log collection service to obtain a data collection container, and packing the log transmission service to obtain a data transmission container; and arranging the data collection container and the data transmission container to obtain a plurality of service container groups.

A state configuration unit 200, configured to configure state information of each service container group; the state information includes storage resources.

The component packaging unit 300 is configured to arrange the state information of each service container group to obtain a state copy set; based on the set of stateful copies, a configuration file is generated and encapsulated into a configuration mapping component.

The cluster building unit 400 is configured to build a distributed coordination service cluster according to each service container group as a node.

The information initialization unit 500 is configured to, in a case where a service start instruction input by a user is received, analyze the service start instruction to obtain registration information and system environment configuration of each service container group; and initializing the registration information and the system environment configuration of each service container group into a node corresponding to each service container group.

The node detection unit 600 is configured to, for each node in the distributed coordination service cluster, prompt a user that a service container group corresponding to the node fails when detecting that the preset check service does not receive the heartbeat packet and the check information sent by the node within a preset time period.

The information checking unit 700 is configured to trigger the preset checking service to compare the checking information with registration information initialized in advance by the node when detecting that the preset checking service receives the heartbeat packet and the checking information sent by the node within a preset time period; and prompting the user that the service container group corresponding to the node fails under the condition that the verification information is determined to be different from the registration information initialized in advance by the node.

A fault detection unit 800, configured to perform health detection on a faulty service container group, and determine whether the faulty service container group is healthy; if the failed service container group is healthy, restarting the failed service container group; if the failed service container group is unhealthy, deleting the failed service container group and re-creating a new service container group.

The data distribution unit 900 is configured to, after receiving each piece of service data sent by a buried point component preset in the service system, distribute each piece of service data to each node of the distributed coordination service cluster.

And the data storage unit 1000 is configured to trigger the data collection container in each node, and write the service data distributed to each node into the storage resource of each node.

A data sending unit 1100, configured to trigger a data transmission container in each node, and send the service data recorded in the storage resource of each node to a preset storage space.

The present application also provides a computer-readable storage medium comprising a stored program, wherein the program performs the container-based big data collection method provided by the present application.

The application also provides a big data acquisition equipment based on container, includes: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein when the programs are run, the container-based big data acquisition method provided by the application is executed, and the method comprises the following steps:

Specifically, on the basis of the above embodiment, the generating a plurality of service container groups based on the log collection service and the log transmission service includes:

Specifically, on the basis of the above embodiment, after configuring the state information of each service container group, the method further includes:

Specifically, on the basis of the above embodiment, after the constructing the distributed coordination service cluster by using each of the service container groups as a node, the method further includes:

Specifically, on the basis of the above embodiment, the method further includes:

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A container-based big data collection method, comprising:

2. The method of claim 1, wherein generating a plurality of service container groups based on the log collection service and the log transmission service comprises:

3. The method according to claim 1, wherein after configuring the status information of each of the service container groups, further comprising:

4. The method according to claim 1, wherein after the constructing the distributed coordination service cluster according to each of the service container groups as nodes, the method further comprises:

5. The method according to claim 1, wherein after the constructing the distributed coordination service cluster according to each of the service container groups as nodes, the method further comprises:

6. The method of claim 5, further comprising:

7. The method of claim 6, further comprising:

8. A container-based big data collection device, comprising:

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the container-based big data collection method of any one of claims 1 to 7.

10. A container-based big data collection device, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the container-based big data acquisition method according to any one of claims 1 to 7.