CN117389825A

CN117389825A - Method, system and device for monitoring Flink job log in real time

Info

Publication number: CN117389825A
Application number: CN202311290323.4A
Authority: CN
Inventors: �田�浩; 路国隋; 张峰; 李存冰; 李照川; 张悦
Original assignee: Inspur Software Technology Co Ltd
Current assignee: Inspur Software Technology Co Ltd
Priority date: 2023-10-08
Filing date: 2023-10-08
Publication date: 2024-01-12

Abstract

The invention discloses a method, a system and a device for monitoring a Flink job log in real time, which belong to the technical field of program data processing, wherein the method comprises the following steps: 1) Constructing a Flink job log, wherein the Flink job log comprises a core logic for realizing a Logback log context selector interface and a standard log structure defined by a custom log layout; 2) Synchronizing the flank job logs, including dependency adjustment, log configuration adjustment and flank configuration adjustment, to achieve real-time synchronization of the logs to Kafka; 3) And the log real-time monitoring comprises the steps of carrying out log screening and filtering by the configuration of a Flink log filtering rule, carrying out analysis audit and mail butt joint by the persistent storage of the Flink log, and carrying out real-time early warning notification. The invention greatly reduces the time and labor cost of the Flink operation and maintenance, and can complete the real-time state monitoring of the Flink operation based on the log.

Description

Method, system and device for monitoring Flink job log in real time

Technical Field

The invention relates to the technical field of program data processing, in particular to a method, a system and a device for monitoring a Flink job log in real time.

Background

The Flink is a powerful big data processing framework and has the advantages of high throughput, low delay, fault tolerance, flexibility and the like. The method is suitable for a scene requiring real-time response, and can automatically restore a calculation state when a node fails; in addition, the Flink also supports rich stream processing operators and functions, such as window manipulation, state management, exact-Once semantics, and the like. With the continuous development of big data technology, the flank has gained more and more attention and application in enterprises and academia. The method can be used for scenes such as real-time data analysis, real-time recommendation systems, complex event processing, real-time instrument panels, data pipelines and the like. The high expansibility and flexibility of the Flink enable it to accommodate application scenarios of different scales and requirements.

Log analysis is well known as an important approach to problem investigation and security auditing. In the using process of the Flink in-production (remote task submitting of the program in the yan-session running mode), different job logs of the Flink are found to be printed together, which job the log belongs to cannot be distinguished, so that problems of the Flink job are difficult to examine and job log monitoring to a certain extent, and no good solution is available at present.

Disclosure of Invention

The technical task of the invention is to provide the method, the system and the device for monitoring the Flink operation log in real time aiming at the defects, so that the time and the labor cost of the Flink operation and maintenance are greatly reduced, and the real-time state monitoring of the Flink operation can be completed based on the log.

The technical scheme adopted for solving the technical problems is as follows:

a method for monitoring a Flink job log in real time includes the following steps:

1) Constructing a Flink job log, wherein the Flink job log comprises a core logic for realizing a Logback log context selector interface and a standard log structure defined by a custom log layout;

2) Synchronizing the flank job logs, including dependency adjustment, log configuration adjustment and flank configuration adjustment, to achieve real-time synchronization of the logs to Kafka;

3) And the log real-time monitoring comprises the steps of carrying out log screening and filtering by the configuration of a Flink log filtering rule, and carrying out analysis and examination and butt joint mail, short message and other notification modes by persistently storing the Flink log to realize real-time early warning notification.

Log back is a log framework following log4j and occupies less space than other existing log frameworks, and also provides functionality that some other log frameworks do not.

Kafka is a distributed, publish/subscribe based messaging system. It can process all action stream data of the consumer in the website;

flink is a framework and distributed processing engine for stateful computation of unrestricted and restricted data retention. The flank is designed to run in all common clustered environments, performing computations at memory speed and on any scale.

The method realizes the Logback log context selector interface, can construct different log contexts for different task operations, and stores the operation ID and the operation name into the log contexts; then, key log information is saved through custom log layout, and the log is synchronized to Kafka through a KafkaApender component; finally, the Flink job log can be obtained in real time in a Kafka message subscription mode and is monitored and analyzed, a general log filtering rule configuration engine can be constructed based on log information of a fixed structure, a user can configure log filtering rules in a self-defined mode, and the log conforming to the filtering rules can be stored in a lasting mode, so that analysis and audit are facilitated.

Preferably, the core logic of the log back log context selector interface,

the Logback provides a ch.qos.logic.selector.Contextselector interface to manage different log contexts, and the core logic for obtaining the log context by the interface is as follows:

1.1 Defining a log context storage map,

1.2 Judging whether the current thread is an example of org.apache.link.runtime.task, if yes, executing the step 1.3); otherwise, returning to a default log context instance;

1.3 Obtaining the current job ID;

1.4 Judging whether a log context instance of the job exists in the log context storage map; if yes, executing the step 1.5); otherwise, creating a job log context instance, setting a job ID and a job name attribute, storing the job ID and the job name attribute in a log context storage map, and executing the step 1.5);

1.5 Obtaining a log context instance of the current job from the log context storage map;

the above steps realize that the job ID and the job name are saved in the log context.

Preferably, the custom log layout defines a standard log structure,

the Logback provides a ch.qos.logback.core.layoutbase abstract class to realize custom expansion of log layout, and the abstract class log layout method (doLayout) is realized according to the following log structure:

context: context, recording a log context name;

flinkJobId: a link job ID, which records the link job ID to which the log belongs;

flinkjobiname: the method comprises the steps of recording a Flink job name to which a log belongs;

time: log time, recording the time of log generation;

level: log grade, record log grade;

thread: a thread for recording a thread for generating a log;

logger: a log recorder for recording the name of the recorder to which the log belongs;

message: log information, recording detailed log information;

thf: an abnormal stack, recording abnormal stack information of the log;

the implementation class is denoted as a.b.c.FlinkLogLayout;

finally, two classes realized in the step of constructing the Flink job log are required to be constructed into a jar packet so as to be convenient for subsequent operation, and the constructed jar packet is marked as a Flink-log-1.0.0.jar.

Preferably, the said dependent adjustment,

log4j dependent packets are first culled to the lib directory under the Flink deployment path, and then the following jar packets are added to the lib directory:

kafka-clients-2.5.0.jar: the kafka client connects to the dependent package,

log back-classification-1.2.3. Jar: the log back related dependency package is used to determine,

log back-core-1.2.3.Jar: the log back related dependency package is used to determine,

logback-kafka-applicator-0.2.0-RC2. Jar: pushing the log to the kafka dependent package,

and (3) constructing a dependency package output by the step of the Flink job log by the Flink-log-1.0.0.jar.

Preferably, the log configuration is adjusted,

first, the log4j related configuration file is removed under the conf directory under the Flink deployment path, then the log back. Xml is modified, and the configuration of the output log to kafka is increased.

Preferably, the Flink configuration adjusts,

to ensure that the log context selector is in effect, the file of the Flink-conf.yaml under the conf directory under the Flink deployment path needs to be modified, and the newly added configuration is as follows:

env.java.opts:-Dlogback.ContextSelector＝a.b.c.FlinkContextSelect or。

preferably, the log is monitored in real time,

after the Flink job log is pushed to the Kafka, the Flink job log is obtained in real time in a mode of Kafka message subscription, wherein the Kafka message subscription can be realized through an open source tool or by self-programming.

Further, a general log filtering rule configuration engine is constructed based on a log information structure of a standard specification;

the user-defined configuration log filtering rules are realized, and logs conforming to the filtering rules can be subjected to persistent storage on a relational database (such as mysql, postgresql and the like), a full-text search database (such as an elastic search and the like) and the like as required, so that subsequent analysis and audit are facilitated.

The invention also discloses a real-time monitoring system for the Flink job log, which comprises a Flink job log building module, a synchronous Flink job log module and a log real-time monitoring module,

the system realizes the real-time monitoring of the Flink job log by the method.

The invention also claims a device for monitoring the Flink job log in real time, which comprises: at least one memory and at least one processor;

the at least one memory for storing a machine readable program;

the at least one processor is configured to invoke the machine-readable program to implement the method described above.

Compared with the prior art, the method for monitoring the Flink job log in real time has the following beneficial effects:

constructing different log contexts aiming at different task operations by constructing a Flink job log, storing a job ID and a job name into the log contexts, and finally outputting a standard and normative log structure; then synchronizing the Flink job log information to Kafka in real time by synchronizing the Flink job log; finally, the Flink job log is obtained in real time in a Kafka message subscription mode and is monitored and analyzed, a user can configure log filtering rules in a self-defined mode, and the log conforming to the filtering rules can be stored in a lasting mode, so that analysis and audit are facilitated. The method greatly reduces the time and labor cost of the Flink operation and maintenance, can complete real-time state monitoring of the Flink operation based on the log, and can perform real-time early warning on the abnormal operation in the follow-up notification modes such as mail abutting, short message and the like, thereby avoiding business risks to the greatest extent.

Drawings

FIG. 1 is a schematic diagram of a method for monitoring a Flink job log in real time according to an embodiment of the present invention;

FIG. 2 is a core logic diagram of a LogBack log context selector interface provided by an embodiment of the present invention;

FIG. 3 is an example log configuration adjustment code provided by an embodiment of the present invention;

FIG. 4 is a diagram of a general log filter rule configuration engine provided by an embodiment of the present invention;

FIG. 5 is an exemplary diagram of core code implementing a LogBack log context selector interface provided by an embodiment of the invention;

FIG. 6 is an exemplary diagram of core code of a custom log layout provided by an embodiment of the present invention;

FIG. 7 is an exemplary diagram of configuration code for adding output logs to kafka in the modified log back. Xml provided by an embodiment of the present invention;

FIG. 8 is a core whole code example I of log real-time monitoring provided by an embodiment of the present invention;

FIG. 9 is a core whole code example II of log real-time monitoring provided by an embodiment of the present invention;

FIG. 10 is a diagram I of log data in a monitoring kafka corresponding topic provided by an embodiment of the present invention;

FIG. 11 is a diagram II of log data in a monitoring kafka corresponding topic provided by an embodiment of the present invention;

FIG. 12 is a diagram of custom configuration log filtering rules provided by an embodiment of the present invention;

FIG. 13 is an exemplary diagram of a log meeting filtering rules being written to a MySQL database for persistent storage in real time, provided by an embodiment of the invention.

Detailed Description

The invention will be further illustrated with reference to specific examples.

A method for monitoring a Flink job log in real time is shown in FIG. 1, and the implementation of the method comprises the following steps:

1. constructing a Flink job log, wherein the Flink job log comprises a core logic for realizing a Logback log context selector interface and a standard log structure defined by a custom log layout;

2. synchronizing the flank job logs, including dependency adjustment, log configuration adjustment and flank configuration adjustment, to achieve real-time synchronization of the logs to Kafka;

3. the method comprises the steps of log real-time monitoring, including log screening and filtering by the configuration of a Flink log filtering rule, analyzing and checking and abutting mail, short message and other notification modes by persistent storage of the Flink log, and realizing real-time early warning notification.

The method is concretely implemented as follows:

1. construction of Flink job logs

Constructing a Flink job log mainly comprises two parts of contents, wherein one is to realize a log back log context selector interface, and the purpose of the interface is to store a job ID and a job name into a log context; the other is a custom log layout whose purpose is to define the flank log structure of the standard specification.

Realizing a LogBack log context selector interface:

the Logback provides a ch.qos.log back.classification.selector.Contextselector interface to manage different log contexts, as shown in fig. 2, and the core logic for implementing the interface to obtain the log context (getLoggerContext) in the method is as follows:

1.1 defining a log context memory map,

1.2, judging whether the current thread is an example of org.apache.flink.runtime.task manager.task, if so, executing the step 1.3; otherwise, returning to a default log context instance;

1.3, acquiring the current operation ID;

1.4, judging whether a log context instance of the job exists in the log context storage map; if yes, executing the step 1.5; otherwise, creating a job log context instance, setting a job ID and a job name attribute, storing the job ID and the job name attribute in a log context storage map, and executing the step 1.5;

1.5, acquiring a log context instance of the current job from the log context memory map;

the implementation class is denoted a.b.c. FlinkContextSelect.

Custom log layout:

the Logback provides a ch.qos.log back.core.LayoutBuse abstract class for realizing the custom expansion of log layout, and the abstract class log layout method (doLayout) is realized according to the following log structure:

context (context): recording a log context name;

flinkJobId (flink job ID): recording the Flink job ID to which the log belongs;

flinkjobiname (flink job name): recording the name of the Flink job to which the log belongs;

time (log time): recording the time of log generation;

level (log level): recording a log grade;

thread (thread): recording a thread generating a log;

log (logger): recording the name of a recorder to which the log belongs;

message (log information): recording log detailed information;

thf (exception stack): recording abnormal stack information of the log;

the implementation class is denoted a.b.c. flinkloglayout.

2. Synchronizing Flink job logs

Dependency adjustment:

kafka-clients-2.5.0.jar: the kafka client connects to the dependent package,

And (3) journal configuration adjustment:

firstly, eliminating log4j related configuration files under a conf directory under a Flink deployment path, then modifying log back. Xml, and adding output logs to the configuration of kafka, wherein the code of the configuration is shown in figure 3;

wherein topic and producerConfig need to be modified according to the actual Kafka environment information.

Flink configuration adjustment:

env.java.opts:-Dlogback.ContextSelector＝a.b.c.FlinkContextSelect or。

after the dependency adjustment, log configuration adjustment, and Flink configuration adjustment operations are completed, the Flink needs to be restarted to validate the configuration.

3. The log is monitored in real time,

after the Flink job log is pushed to the Kafka, the Flink job log can be obtained in real time in a mode of subscribing Kafka information, and the subscription of the Kafka information can be realized through an open source tool or by writing a program by oneself; based on the log information structure of the standard specification, a general log filtering rule configuration engine is constructed; as shown in fig. 4;

the user can configure log filtering rules in a self-defined mode, logs conforming to the filtering rules can be subjected to persistent storage on a relational database (such as mysql, postgresql) and a full-text search database (such as an elastic search) according to requirements, and follow-up analysis audit is facilitated.

Taking Flink-1.14.3 as an example, the operating system is centos7.6, and the method for monitoring the Flink job log in real time based on Logback and Kafka comprises the following specific implementation steps:

1. construction of Flink job logs

Realizing a Logback log context selector interface, wherein core realization codes are shown in figure 5;

the log layout is customized, and the core implementation code is shown in FIG. 6;

the two implementation classes are constructed as flink-log-1.0.0.Jar.

2. Synchronizing Flink job logs

Dependency adjustment:

the following commands are executed to cull log4j dependent packets:

creating log4j dependent package backup directory

mkdir log4j-lib-bak

Moving log4j dependent packets to backup directory, log4j-api, log4j-slf4j, log4j-coremv lib/log4j- [ a-z ]. Jar log4j-lib-bak ]

The following jar packets are then added to the lib directory:

kafka-clients-2.5.0.Jar (kafka client connection dependent package)

log back-class-1.2.3. Jar (log back-related dependency package)

log back-core-1.2.3.Jar (log back dependent package)

Logback-kafka-applicator-0.2.0-RC2. Jar (push log to kafka dependency package)

Flink-log-1.0.0.Jar (dependency package for step output in building Flink job log)

hutool-all-5.8.9.Jar (dependency package required for constructing a flink-log-1.0.0. Jar)

Log configuration adjustment

The following commands are executed to cull log4j related profiles:

creating log4j configuration backup directories

mkdir conf/log4j-config-bak

# move log4 j-related configuration to backup directory

mv conf/log4j*.properties conf/log4j-config-bak/

The logback. Xml is then modified, adding the output log to the configuration of kafka, as shown in fig. 7.

Flink configuration adjustment

The method comprises the steps of modifying a Flink-conf.yaml file under a conf directory under a Flink deployment path, and newly adding and configuring as follows:

env.java.opts:-Dlogback.ContextSelector＝a.b.c.FlinkContextSelect or。

the flink is restarted.

3. Real-time log monitoring

Submitting a Flink job to the Flink cluster, wherein the Flink job mainly realizes that a random number is generated every 3 seconds, then judging that the number is output and printed if the number is more than or equal to 50, and carrying out log printing on the number when one random number is generated, and manually throwing out an abnormal termination job after more than 10 random numbers are generated, wherein the core codes for realizing the job are shown in fig. 8 and 9;

after commit, log data in kafka corresponding topic (flink-job-logs) is monitored as shown in FIGS. 10 and 11. As can be seen in the figure, the flink job log can be synchronized into kafka in real time and contains flinkJobId and flinkJobName attributes in the log.

The custom configuration log filtering rule is shown in fig. 12; the logs conforming to the filtering rules are written into the MySQL database in real time for persistent storage, so that analysis audit is facilitated, as shown in fig. 13.

The embodiment of the invention also provides a device for monitoring the Flink job log in real time, which comprises the following steps: at least one memory and at least one processor;

the at least one memory for storing a machine readable program;

the at least one processor is configured to invoke the machine-readable program to implement the method described in the foregoing embodiments.

The present invention can be easily implemented by those skilled in the art through the above specific embodiments. It should be understood that the invention is not limited to the particular embodiments described above. Based on the disclosed embodiments, a person skilled in the art may combine different technical features at will, so as to implement different technical solutions.

Other than the technical features described in the specification, all are known to those skilled in the art.

Claims

1. A method for monitoring a Flink job log in real time is characterized by comprising the following steps:

3) And the log real-time monitoring comprises the steps of carrying out log screening and filtering by the configuration of a Flink log filtering rule, carrying out analysis audit and mail butt joint by the persistent storage of the Flink log, and carrying out real-time early warning notification.

2. The method for real-time monitoring of a link job log according to claim 1, wherein the core logic of the log back log context selector interface,

the LogBack provides a ch.qos.log back.classification.selector.ContextSelect interface to realize the management of different log contexts, and the core logic for realizing the acquisition of the log contexts by the interface is as follows:

1.1 Defining a log context storage map,

1.3 Obtaining the current job ID;

3. The method for monitoring the link job log in real time according to claim 2, wherein the standard log structure defined by the custom log layout,

the Logback provides a ch.qos.log back.core.LayoutBuse abstract class for realizing the custom expansion of log layout, and the abstract class log layout method is realized according to the following log structure:

context: context, recording a log context name;

time: log time, recording the time of log generation;

level: log grade, record log grade;

thread: a thread for recording a thread for generating a log;

message: log information, recording detailed log information;

thf: an abnormal stack, recording abnormal stack information of the log;

the implementation class is denoted as a.b.c.FlinkLogLayout;

4. A method for real-time monitoring of a Flink job log according to claim 1, 2 or 3, wherein the dependency adjustment,

kafka-clients-2.5.0.jar: the kafka client connects to the dependent package,

5. The method for monitoring the link job log in real time according to claim 4, wherein the log configuration is adjusted,

6. The method for monitoring the link job log in real time according to claim 5, wherein the link configuration is adjusted,

env.java.opts:-Dlogback.ContextSelector＝a.b.c.FlinkContextSelect or。

7. the method for monitoring the journal of a link job in real time according to claim 6, wherein the journal is monitored in real time,

8. The method for monitoring the link job log in real time according to claim 7, wherein a general log filtering rule configuration engine is constructed based on a log information structure of a standard specification;

the user-defined configuration log filtering rules are realized, and the logs conforming to the filtering rules can be butted with the relational database and the full-text retrieval database as required for persistent storage, so that the subsequent analysis audit is convenient.

9. A real-time monitoring system for a Flink job log is characterized by comprising a Flink job log building module, a synchronous Flink job log module and a log real-time monitoring module,

the system realizes the real-time monitoring of the Flink job log by the method of any one of claims 1 to 8.

10. A fly job log real-time monitoring device is characterized by comprising: at least one memory and at least one processor;

the at least one memory for storing a machine readable program;

the at least one processor being configured to invoke the machine readable program to implement the method of any of claims 1 to 8.