CN111782473A - Distributed log data processing method, device and system - Google Patents

Distributed log data processing method, device and system Download PDF

Info

Publication number
CN111782473A
CN111782473A CN202010611847.9A CN202010611847A CN111782473A CN 111782473 A CN111782473 A CN 111782473A CN 202010611847 A CN202010611847 A CN 202010611847A CN 111782473 A CN111782473 A CN 111782473A
Authority
CN
China
Prior art keywords
log data
log
data
distributed
cleaning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010611847.9A
Other languages
Chinese (zh)
Inventor
周歆
王炳辉
易辛悦
章磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202010611847.9A priority Critical patent/CN111782473A/en
Publication of CN111782473A publication Critical patent/CN111782473A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/875Monitoring of systems including the internet

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a distributed log data processing method, a device and a system, wherein the method comprises the following steps: receiving log data sent by each application node in a distributed system, and storing the log data into a corresponding temporary database according to a preset attribute identifier; acquiring the log data from the corresponding temporary database according to the attribute identification, performing data cleaning on the log data, and sending the log data subjected to the data cleaning to a preset log management center; the method and the device can effectively improve the efficiency of collecting and processing the log data in the whole distributed system.

Description

Distributed log data processing method, device and system
Technical Field
The present application relates to the field of data processing, and in particular, to a distributed log data processing method, apparatus, and system.
Background
With the rapid development of a back-end service platform, when the traffic volume is increased, an application server in the prior art generates a large amount of application logs, and the allowance of the performance of the server is increased by sending all application log data to a unified log center for management, so that the problem of abnormal application transaction caused by the fact that system resources are occupied by log collection in a special scene of the application server is solved.
However, at the same time, the original log data is not simply filtered, and all the original log data is sent to the log center, which causes a performance bottleneck to the log center, and not all the content in the original log data needs to be sent to the log center for analysis, processing and query, thereby reducing the processing efficiency of the log center.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a distributed log data processing method, device and system, which can effectively improve the efficiency of collecting and processing log data in the whole distributed system.
In order to solve at least one of the above problems, the present application provides the following technical solutions:
in a first aspect, the present application provides a distributed log data processing method, including:
receiving log data sent by each application node in a distributed system, and storing the log data into a corresponding temporary database according to a preset attribute identifier;
and acquiring the log data from the corresponding temporary database according to the attribute identification, performing data cleaning on the log data, and sending the log data subjected to the data cleaning to a preset log management center.
Further, the performing data washing on the log data includes:
determining component characteristic information in the log data, wherein the component characteristic information is generated by a log collector used when each application node sends the log data;
and removing the component characteristic information from the log data to obtain the log data from which the component characteristic information is removed.
Further, the performing data washing on the log data includes:
judging whether the log data contains noise data matched with a preset noise field or not;
and if so, removing the noise data from the log data to obtain the log data with the noise data removed.
Further, before the performing data cleansing on the log data, the method further includes:
and according to the service identification in the log data, carrying out log merging processing on a plurality of log data with the same service identification to obtain the log data subjected to the log merging processing.
In a second aspect, the present application provides a distributed log data processing apparatus, including:
the log data channel establishing module is used for receiving log data sent by each application node in the distributed system and storing the log data into a corresponding temporary database according to a preset attribute identifier;
and the log data pre-cleaning module is used for acquiring the log data from the corresponding temporary database according to the attribute identification, cleaning the data of the log data and sending the log data subjected to data cleaning to a preset log management center.
Further, the log data pre-cleaning module comprises:
a component characteristic information determining unit, configured to determine component characteristic information in the log data, where the component characteristic information is generated by a log collector used when each application node sends the log data;
and the component characteristic information cleaning unit is used for removing the component characteristic information from the log data to obtain the log data from which the component characteristic information is removed.
Further, the log data pre-cleaning module comprises:
the noise field matching unit is used for judging whether the log data contains noise data matched with a preset noise field;
and the noise data cleaning unit is used for removing the noise data from the log data to obtain the log data after the noise data is removed if the log data is judged to contain the noise data matched with a preset noise field.
Further, still include:
and the log data merging unit is used for performing log merging processing on a plurality of log data with the same service identifier according to the service identifier in the log data to obtain the log data subjected to the log merging processing.
In a third aspect, the present application provides a distributed log data processing system, including each application node, a log pre-processing node, and a log management center in the distributed system;
the log pre-processing node comprises:
the log data channel establishing module is used for receiving log data sent by each application node in the distributed system and storing the log data into a corresponding temporary database according to a preset attribute identifier;
and the log data pre-cleaning module is used for acquiring the log data from the corresponding temporary database according to the attribute identification, cleaning the data of the log data and sending the log data subjected to data cleaning to the log management center.
In a fourth aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the distributed log data processing method when executing the program.
In a fifth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the distributed log data processing method described.
According to the technical scheme, a log processing node is additionally arranged between a front application node and a rear log management center (such as elastic search, Kafka and the like) in the distributed system, the pressure of the previous large-data-volume log processing is transferred to the log processing node, the large-data-volume log data from different nodes are accurately cached through a temporary database with attribute identification, the log data are sequentially acquired to perform prepositive data cleaning work, the data-cleaned log data are transmitted to the rear log management center, and the efficiency of acquiring and processing the whole log data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a distributed log data processing method in an embodiment of the present application;
FIG. 2 is a second flowchart illustrating a distributed log data processing method according to an embodiment of the present application;
fig. 3 is a third schematic flowchart of a distributed log data processing method in an embodiment of the present application;
fig. 4 is one of the structural diagrams of a distributed log data processing apparatus in the embodiment of the present application;
fig. 5 is a second block diagram of a distributed log data processing apparatus according to an embodiment of the present application;
fig. 6 is a third block diagram of a distributed log data processing apparatus according to an embodiment of the present application;
FIG. 7 is a block diagram of a distributed log data processing system in an embodiment of the present application;
fig. 8 is a schematic flowchart illustrating a process of accessing log data to a log management center according to an embodiment of the present application;
FIG. 9 is a flowchart illustrating a process of docking an AMC alarm platform in an embodiment of the present application;
FIG. 10 is a flowchart illustrating a distributed log data processing method according to an embodiment of the present application;
FIG. 11 is a diagram illustrating a log processing node architecture deployment in an embodiment of the present application;
FIG. 12 is a schematic diagram of a deployment with high availability of log processing nodes in an embodiment of the present application;
fig. 13 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Considering that the prior art does not simply filter the original log data and send all the log data to the log center, which causes a performance bottleneck to the log center, and all the content in the original log data does not need to be sent to the log center for analysis, processing and query, thereby also reducing the processing efficiency of the log center, the application provides a distributed log data processing method, device and system, by adding a new log processing node between a front application node and a rear log management center (such as elastic search, Kafka, etc.) in the distributed system, transferring the log processing pressure of the previous large data volume to the log processing node, accurately caching the large data volume log data from different nodes through a temporary database with attribute identification, and orderly acquiring the log data to perform prepositive data cleaning work, the log data after data cleaning is transmitted to a log management center at the rear part, so that the efficiency of collecting and processing the whole log data is improved.
In order to effectively improve the efficiency of collecting and processing log data in the whole distributed system, the present application provides an embodiment of a distributed log data processing method, and referring to fig. 1, the distributed log data processing method specifically includes the following contents:
step S101: receiving log data sent by each application node in the distributed system, and storing the log data into a corresponding temporary database according to a preset attribute identifier.
Optionally, the log data sent by each application node in the distributed system may be stored in a distributed publish-subscribe message system, that is, a kafka component, where the kafka component is preconfigured with different topic attributes (that is, the attribute identifiers) for storing the log data in different temporary databases, so that subsequent nodes may perform data acquisition.
It can be understood that the log data are stored in the corresponding different temporary databases through the preset attribute identifiers, so that the log data from different application nodes can be temporarily stored in a unified manner, and the problem of log data acquisition abnormity cannot be caused when the log data are subsequently processed (for example, data cleaning).
Step S102: and acquiring the log data from the corresponding temporary database according to the attribute identification, performing data cleaning on the log data, and sending the log data subjected to the data cleaning to a preset log management center.
Optionally, a log pre-processing node is arranged, log data in the kafka component is subscribed, so that corresponding log data can be accurately obtained from the kafka component, preliminary data cleaning operation is performed, the log data subjected to data cleaning is sent to a preset log management center, the operation pressure of a subsequent log management center is relieved, and the accuracy of the log management center in service processing of the log data can be improved.
As can be seen from the above description, the distributed log data processing method provided in the embodiment of the present application can transfer the log processing pressure of the previous large data volume to the log processing node by adding a new log processing node between the front application node and the rear log management center (such as elastic search, Kafka, etc.) in the distributed system, accurately cache the log data of the large data volume from different nodes through the temporary database having the attribute identifier, and sequentially acquire the log data to perform the prepositive data cleaning operation, and transmit the log data after data cleaning to the rear log management center, thereby improving the efficiency of acquiring and processing the entire log data.
In order to perform preliminary data cleaning on log data before the log data is transmitted to the log management center, so as to reduce the operation pressure of the log management center, in an embodiment of the distributed log data processing method of the present application, referring to fig. 2, the step S102 may further specifically include the following steps:
step S201: determining component characteristic information in the log data, wherein the component characteristic information is generated by a log collector used when each application node sends the log data.
Step S202: and removing the component characteristic information from the log data to obtain the log data from which the component characteristic information is removed.
Optionally, the application node may be configured with a log collector (e.g., a Filebeat component) to achieve collection and transmission functions of log data, and it can be understood that different log collectors generally add component feature information, such as a timestamp and a log name, in original log data according to their features, and the component feature information has no effect on subsequent service processing according to the log data, so that the component feature information may be removed at a log front node to effectively clean the log data, improve accuracy of subsequent operations, and reduce operation pressure of a log management center.
In order to perform preliminary data cleaning on log data before the log data is transmitted to the log management center, so as to reduce the operation pressure of the log management center, in an embodiment of the distributed log data processing method of the present application, referring to fig. 3, the step S102 may further specifically include the following steps:
step S301: and judging whether the log data contains noise data matched with a preset noise field.
Step S302: and if so, removing the noise data from the log data to obtain the log data with the noise data removed.
Optionally, the log data collected by the log collector of the application node includes a plurality of field information, and the application can match all the field information included in the log data by setting a noise field in advance, and if the matching is successful, it indicates that the log data includes subsequent field information which is not needed, i.e., noise data, so that the noise data can be removed at the pre-node of the log, so as to effectively clean the log data, improve the accuracy of subsequent operation, and reduce the operation pressure of the log management center.
In order to accurately merge different log data of the same service before performing data cleansing, so as to improve data cleansing efficiency and reduce computational pressure of a log management center, in an embodiment of the distributed log data processing method of the present application, before performing data cleansing on the log data, the method further includes:
and according to the service identification in the log data, carrying out log merging processing on a plurality of log data with the same service identification to obtain the log data subjected to the log merging processing.
Optionally, before the log data is subjected to preliminary data cleaning, merging of multiple pieces of relevant log data may be performed according to the service identifier in the log data, that is, multiple pieces of log data having the same service identifier (that is, belonging to the same service) may be subjected to merging operation of the log data according to a preset data format, so as to reduce the amount of the log data and improve the efficiency of subsequent operations.
In order to effectively improve the efficiency of collecting and processing log data in the whole distributed system, the present application provides an embodiment of a distributed log data processing apparatus for implementing all or part of the contents of the distributed log data processing method, and referring to fig. 4, the distributed log data processing apparatus specifically includes the following contents:
the log data channel establishing module 10 is configured to receive log data sent by each application node in the distributed system, and store the log data in a corresponding temporary database according to a preset attribute identifier.
And the log data pre-cleaning module 20 is configured to acquire the log data from the corresponding temporary database according to the attribute identifier, perform data cleaning on the log data, and send the log data subjected to the data cleaning to a preset log management center.
As can be seen from the above description, the distributed log data processing apparatus provided in the embodiment of the present application can transfer the log processing pressure of the previous large data volume to the log processing node by adding a new log processing node between the front application node and the rear log management center (e.g., elastic search, Kafka, etc.) in the distributed system, accurately cache the log data of the large data volume from different nodes by using the temporary database having the attribute identifier, and sequentially acquire the log data to perform the prepositive data cleaning operation, and transmit the log data after data cleaning to the rear log management center, thereby improving the efficiency of collecting and processing the entire log data.
In order to perform preliminary data washing on log data before the log data is transmitted to a log management center, so as to reduce the operation pressure of the log management center, in an embodiment of the distributed log data processing apparatus of the present application, referring to fig. 5, the log data pre-washing module 20 includes:
a component characteristic information determining unit 21, configured to determine component characteristic information in the log data, where the component characteristic information is generated by a log collector used when each application node sends the log data.
And the component characteristic information cleaning unit 22 is configured to remove the component characteristic information from the log data to obtain the log data with the component characteristic information removed.
In order to perform preliminary data washing on log data before the log data is transmitted to a log management center, so as to reduce the operation pressure of the log management center, in an embodiment of the distributed log data processing apparatus of the present application, referring to fig. 6, the log data pre-washing module 20 includes:
and a noise field matching unit 23, configured to determine whether the log data includes noise data matched with a preset noise field.
And the noise data cleaning unit 24 is configured to remove the noise data from the log data to obtain the log data with the noise data removed, if it is determined that the log data contains the noise data matched with a preset noise field.
In order to accurately merge different log data of the same service before performing data cleansing, so as to improve data cleansing efficiency and reduce the operation pressure of a log management center, in an embodiment of the present application, the distributed log data processing apparatus further includes:
and the log data merging unit is used for performing log merging processing on a plurality of log data with the same service identifier according to the service identifier in the log data to obtain the log data subjected to the log merging processing.
In order to further explain the present solution, the present application further provides a specific application example of a distributed log data processing system that uses the distributed log data processing apparatus to implement the distributed log data processing method, see fig. 7, which specifically includes each application node, a log pre-processing node, and a log management center in the distributed system;
the log pre-processing node comprises:
the log data channel establishing module is used for receiving log data sent by each application node in the distributed system and storing the log data into a corresponding temporary database according to a preset attribute identifier;
and the log data pre-cleaning module is used for acquiring the log data from the corresponding temporary database according to the attribute identification, cleaning the data of the log data and sending the log data subjected to data cleaning to the log management center.
Referring to fig. 8, a manner of accessing the log management center by the API platform in the prior art is shown.
Specifically, by deploying the Filebeat component on the application node, the log is completely sent to the log center in the row by utilizing the advantages of high performance and low power consumption of the component. The log does not need to be processed at the application node, and all the contents of the log are sent to the log center without changing the original state. This in fact places a great deal of stress on the log centre.
Referring to fig. 9, a method for performing statistical log alarm for interfacing AMC application in the prior art is provided.
From fig. 8 and fig. 9, we can see that the platform will send the log to the log center and Kafka, respectively, to implement the subsequent processing flow of the log. According to the scheme, a log processing node is added after the application node and is used for carrying out primary processing and filtering on the log, and then the log is forwarded to the subsequent Kafka or ElasticSearch needing to be docked. After the scheme is implemented, the flow is shown in fig. 10.
We can see that, by using the filebed component deployed on the node, the complete log of the node is sent to Kafka for buffering, and the log processing node subscribes the log files of all the nodes, judges the types in the log files, simply filters the log, and sorts the log to the rear log center ES and the alarm Kafka cluster.
The log processing nodes are developed by adopting a real-time stream computing platform Flink. Flink has the following several advantages:
1. flink is able to provide accurate results even in the presence of out-of-order or delayed loading of data.
2. The Flink is stateful and fault tolerant, and can seamlessly repair errors when maintaining a complete and applied state once.
3. The distributed computing framework is a distributed computing framework, supports large-scale operation and lateral extension, and can have good throughput and low delay when operating on multiple nodes.
With the Flink framework, we can also perform preliminary filtering on the logs. Since the filebed component carries many additional contents while log transmission is performed, the added contents do not help the subsequent data analysis for the scene collected at present. Therefore, the log processing node can simply clean the content of the log, and the core content of the log can be refined through the simple cleaning, so that the pressure of receiving a large amount of logs at the rear part is reduced.
The log processing node is used for preliminarily cleaning the log, the log amount can be reduced to about 60% before processing, and the pressure of a front application node is reduced and the pressure of a rear receiving log node is released in a log processing link perfectly.
The following explains the structural deployment of the log processing node. The log processing node is built based on a Flink architecture, and deployment is carried out by adopting a Flink-Standalone-Cluster-HA mode. The Zookeeper cluster is accessed to ensure the high availability of the cluster, and meanwhile, the automatic expansion and transverse expansion of the nodes under the high pressure condition are ensured in a cloud-entering containerization mode. Refer to fig. 11 and 12.
Finally, the Flink-Standalone-HA high availability is illustrated.
When the Flink cluster is started, a jobmanager and one or more taskmanagers are started first. The Client submits the tasks to the JobManager, the JobManager dispatches the tasks to each TaskManager to execute, and then the TaskManager reports the heartbeat and the statistical information to the JobManager. The standby-HA means that a plurality of jobmanagers are started on a plurality of servers, and distributed coordination is performed among all running JobManager instances through zk configured by Flink. Highly available distributed coordination services are provided through leader election and state storage with light-level consistency.
As can be seen from the above, the present application can achieve at least the following technical effects:
the log processing pressure of the previous large data volume is transferred to the log processing node by adding a log processing node between the front application node and the rear log data endpoint (such as elastic search, Kafka and the like). And the log processing node is built based on a Flink real-time stream computing platform, and supports horizontal expansion under the condition of ensuring high availability by combining the characteristics of the log processing node. The method completely meets the processing method which is solved by increasing the number of transverse nodes in a mode of coping with the explosion of data volume in the distributed field. According to the preliminary estimation of the test condition, after the method is adopted, the resource consumption of the server of the front log processing node is reduced by 30%, the size of the received data of the rear log data terminal is refined by 40%, and the method is more reasonable and efficient for the whole log data acquisition process.
In terms of hardware, in order to effectively improve efficiency of collecting and processing log data in the whole distributed system, the present application provides an embodiment of an electronic device for implementing all or part of contents in the distributed log data processing method, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the distributed log data processing device and relevant equipment such as a core service system, a user terminal, a relevant database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the logic controller may be implemented with reference to the embodiment of the distributed log data processing method in the embodiment and the embodiment of the distributed log data processing apparatus, and the contents thereof are incorporated herein, and repeated descriptions are omitted.
It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), an in-vehicle device, a smart wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In practical applications, part of the distributed log data processing method may be executed on the electronic device side as described in the above, or all operations may be completed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device may further include a processor if all operations are performed in the client device.
The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
Fig. 13 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 13, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 13 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the distributed log data processing method functions may be integrated into the central processor 9100. The central processor 9100 may be configured to control as follows:
step S101: receiving log data sent by each application node in the distributed system, and storing the log data into a corresponding temporary database according to a preset attribute identifier.
Step S102: and acquiring the log data from the corresponding temporary database according to the attribute identification, performing data cleaning on the log data, and sending the log data subjected to the data cleaning to a preset log management center.
As can be seen from the above description, in the electronic device provided in the embodiment of the present application, a log processing node is additionally provided between a front application node and a rear log management center (e.g., elastic search, Kafka, etc.) in a distributed system, log processing pressure of a previous large data volume is transferred to the log processing node, the large data volume log data from different nodes is accurately cached by a temporary database with attribute identification, the log data is sequentially acquired to perform a prepositive data cleaning operation, and the log data after data cleaning is transmitted to the rear log management center, so that the efficiency of collecting and processing the whole log data is improved.
In another embodiment, the distributed log data processing apparatus may be configured separately from the central processor 9100, for example, the distributed log data processing apparatus may be configured as a chip connected to the central processor 9100, and the functions of the distributed log data processing method may be implemented by the control of the central processor.
As shown in fig. 13, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 13; in addition, the electronic device 9600 may further include components not shown in fig. 13, which can be referred to in the prior art.
As shown in fig. 13, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all the steps in the distributed log data processing method with the execution subject being the server or the client in the foregoing embodiments, where the computer-readable storage medium stores a computer program thereon, and when the computer program is executed by a processor, the computer program implements all the steps in the distributed log data processing method with the execution subject being the server or the client in the foregoing embodiments, for example, when the processor executes the computer program, the processor implements the following steps:
step S101: receiving log data sent by each application node in the distributed system, and storing the log data into a corresponding temporary database according to a preset attribute identifier.
Step S102: and acquiring the log data from the corresponding temporary database according to the attribute identification, performing data cleaning on the log data, and sending the log data subjected to the data cleaning to a preset log management center.
As can be seen from the above description, in the computer-readable storage medium provided in this embodiment of the present application, a new log processing node is added between a front application node and a rear log management center (e.g., elastic search, Kafka, etc.) in a distributed system, log processing pressure of a previous large data volume is transferred to the log processing node, the large data volume log data from different nodes is accurately cached by a temporary database with attribute identifiers, the log data is sequentially obtained to perform a prepositive data cleaning operation, and the log data after data cleaning is transmitted to the rear log management center, so that the efficiency of collecting and processing the entire log data is improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (11)

1. A distributed log data processing method, the method comprising:
receiving log data sent by each application node in a distributed system, and storing the log data into a corresponding temporary database according to a preset attribute identifier;
and acquiring the log data from the corresponding temporary database according to the attribute identification, performing data cleaning on the log data, and sending the log data subjected to the data cleaning to a preset log management center.
2. The distributed log data processing method of claim 1, wherein the performing data cleansing on the log data comprises:
determining component characteristic information in the log data, wherein the component characteristic information is generated by a log collector used when each application node sends the log data;
and removing the component characteristic information from the log data to obtain the log data from which the component characteristic information is removed.
3. The distributed log data processing method of claim 1, wherein the performing data cleansing on the log data comprises:
judging whether the log data contains noise data matched with a preset noise field or not;
and if so, removing the noise data from the log data to obtain the log data with the noise data removed.
4. The distributed log data processing method of claim 1, further comprising, before the performing data cleansing on the log data:
and according to the service identification in the log data, carrying out log merging processing on a plurality of log data with the same service identification to obtain the log data subjected to the log merging processing.
5. A distributed log data processing apparatus, comprising:
the log data channel establishing module is used for receiving log data sent by each application node in the distributed system and storing the log data into a corresponding temporary database according to a preset attribute identifier;
and the log data pre-cleaning module is used for acquiring the log data from the corresponding temporary database according to the attribute identification, cleaning the data of the log data and sending the log data subjected to data cleaning to a preset log management center.
6. The distributed log data processing apparatus of claim 5, wherein the log data pre-flush module comprises:
a component characteristic information determining unit, configured to determine component characteristic information in the log data, where the component characteristic information is generated by a log collector used when each application node sends the log data;
and the component characteristic information cleaning unit is used for removing the component characteristic information from the log data to obtain the log data from which the component characteristic information is removed.
7. The distributed log data processing apparatus of claim 5, wherein the log data pre-flush module comprises:
the noise field matching unit is used for judging whether the log data contains noise data matched with a preset noise field;
and the noise data cleaning unit is used for removing the noise data from the log data to obtain the log data after the noise data is removed if the log data is judged to contain the noise data matched with a preset noise field.
8. The distributed log data processing apparatus of claim 5, further comprising:
and the log data merging unit is used for performing log merging processing on a plurality of log data with the same service identifier according to the service identifier in the log data to obtain the log data subjected to the log merging processing.
9. A distributed log data processing system is characterized by comprising application nodes, a log pre-processing node and a log management center in the distributed system;
the log pre-processing node comprises:
the log data channel establishing module is used for receiving log data sent by each application node in the distributed system and storing the log data into a corresponding temporary database according to a preset attribute identifier;
and the log data pre-cleaning module is used for acquiring the log data from the corresponding temporary database according to the attribute identification, cleaning the data of the log data and sending the log data subjected to data cleaning to the log management center.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the distributed log data processing method of any one of claims 1 to 4 are implemented by the processor when executing the program.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the distributed log data processing method of any one of claims 1 to 4.
CN202010611847.9A 2020-06-30 2020-06-30 Distributed log data processing method, device and system Pending CN111782473A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010611847.9A CN111782473A (en) 2020-06-30 2020-06-30 Distributed log data processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010611847.9A CN111782473A (en) 2020-06-30 2020-06-30 Distributed log data processing method, device and system

Publications (1)

Publication Number Publication Date
CN111782473A true CN111782473A (en) 2020-10-16

Family

ID=72760415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010611847.9A Pending CN111782473A (en) 2020-06-30 2020-06-30 Distributed log data processing method, device and system

Country Status (1)

Country Link
CN (1) CN111782473A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112506954A (en) * 2020-12-25 2021-03-16 新浪网技术(中国)有限公司 Database auditing method and device
CN112948845A (en) * 2021-02-01 2021-06-11 航天科技控股集团股份有限公司 Data processing method and system based on Internet of things data center
CN113254308A (en) * 2021-05-19 2021-08-13 中国联合网络通信集团有限公司 Log processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105262812A (en) * 2015-10-16 2016-01-20 浪潮(北京)电子信息产业有限公司 Log data processing method based on cloud computing platform, log data processing device and log data processing system
TW201828175A (en) * 2017-01-19 2018-08-01 阿里巴巴集團服務有限公司 Log data processing method and apparatus improving the scalability and processing performance of the distributed database system in terms of the transaction logs
CN110209518A (en) * 2019-04-26 2019-09-06 福州慧校通教育信息技术有限公司 A kind of multi-data source daily record data, which is concentrated, collects storage method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105262812A (en) * 2015-10-16 2016-01-20 浪潮(北京)电子信息产业有限公司 Log data processing method based on cloud computing platform, log data processing device and log data processing system
TW201828175A (en) * 2017-01-19 2018-08-01 阿里巴巴集團服務有限公司 Log data processing method and apparatus improving the scalability and processing performance of the distributed database system in terms of the transaction logs
CN110209518A (en) * 2019-04-26 2019-09-06 福州慧校通教育信息技术有限公司 A kind of multi-data source daily record data, which is concentrated, collects storage method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112506954A (en) * 2020-12-25 2021-03-16 新浪网技术(中国)有限公司 Database auditing method and device
CN112948845A (en) * 2021-02-01 2021-06-11 航天科技控股集团股份有限公司 Data processing method and system based on Internet of things data center
CN113254308A (en) * 2021-05-19 2021-08-13 中国联合网络通信集团有限公司 Log processing method and device

Similar Documents

Publication Publication Date Title
CN111782473A (en) Distributed log data processing method, device and system
CN111090699A (en) Service data synchronization method and device, storage medium and electronic device
CN111782470A (en) Distributed container log data processing method and device
CN110764881A (en) Distributed system background retry method and device
CN106815254A (en) A kind of data processing method and device
CN103139157A (en) Network communication method based on socket, device and system
CN112769945B (en) Distributed service calling method and device
CN113435989A (en) Financial data processing method and device
CN112181678A (en) Service data processing method, device and system, storage medium and electronic device
CN111259066A (en) Server cluster data synchronization method and device
CN114237896A (en) Distributed node resource dynamic scheduling method and device
CN114741400A (en) Data synchronization and analysis method, device and terminal equipment
CN112910708B (en) Distributed service calling method and device
CN107249019A (en) Data handling system, method, device and server based on business
CN116456496B (en) Resource scheduling method, storage medium and electronic equipment
CN112396511A (en) Distributed wind control variable data processing method, device and system
CN115914375A (en) Disaster tolerance processing method and device for distributed message platform
CN111190731A (en) Cluster task scheduling system based on weight
CN104079368B (en) A kind of the test data transmission method and server of application software
CN115562898A (en) Distributed payment system exception handling method and device
CN114661563A (en) Data processing method and system based on stream processing framework
CN114374614A (en) Network topology configuration method and device
CN111061518B (en) Data processing method, system, terminal equipment and storage medium based on drive node
CN113918436A (en) Log processing method and device
CN108805741B (en) Fusion method, device and system of power quality data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination