CN113342608B - Method and device for monitoring tasks of streaming computing engine - Google Patents

Method and device for monitoring tasks of streaming computing engine Download PDF

Info

Publication number
CN113342608B
CN113342608B CN202110639027.5A CN202110639027A CN113342608B CN 113342608 B CN113342608 B CN 113342608B CN 202110639027 A CN202110639027 A CN 202110639027A CN 113342608 B CN113342608 B CN 113342608B
Authority
CN
China
Prior art keywords
delay
task
record
time
flink
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110639027.5A
Other languages
Chinese (zh)
Other versions
CN113342608A (en
Inventor
刘伟
金磐石
杨晓勤
李世宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202110639027.5A priority Critical patent/CN113342608B/en
Publication of CN113342608A publication Critical patent/CN113342608A/en
Application granted granted Critical
Publication of CN113342608B publication Critical patent/CN113342608B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3017Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for monitoring a task of a streaming computing engine, wherein the method comprises the following steps: periodically acquiring an application identifier of the Flink application according to a first time step; acquiring a task identification list with at least one task identification corresponding to the application identification; determining a Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task; generating a delay record corresponding to the overall delay time, marking the state of the delay record as an unread state, and writing the state of the delay record into a record file; according to the second time step, the monitoring module is periodically called to read delay records with unread states in the record file, whether delay records with abnormal delay exist in the read delay records or not is judged, and if the delay records exist, a generated alarm instruction is sent to the alarm module, so that the alarm module alarms. The delay record is monitored to alert the staff in time when there is a delay abnormality.

Description

Method and device for monitoring tasks of streaming computing engine
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for monitoring a task of a streaming computing engine.
Background
With the rapid development of big data, various popular open source community technologies are beginning to be applied in the computer industry, such as Hadoop, storm, spark and Flink, wherein the Flink is a distributed processing engine for streaming data and batch data, and the Flink is the only set of distributed streaming data processing framework integrating high throughput, low delay and high performance in the current open source community, so that the Flink becomes the main stream choice of each user in the real-time computing field.
An application built based on a Flink may be referred to as a Flink application, which may execute a Flink task corresponding to the Flink application to implement a function of the Flink application when running, where the Flink task may also be referred to as a streaming computing engine task. Although the Flink framework has various advantages, the means for alarming aiming at task execution delay in the Flink framework is imperfect, so that the application constructed by the Flink framework is easy to have the condition of long-time task delay in operation and the condition cannot be solved in time.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for monitoring a task of a streaming computing engine, which are used for monitoring the task of the streaming computing engine and giving an alarm in time when a delay abnormality occurs, so that a worker can learn the delay abnormality in time.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
A method for monitoring a task of a streaming computing engine, comprising:
Periodically acquiring an application identifier of the Flink application according to a preset first time step;
acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification;
Determining a Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task;
generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
According to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record with the unread state in the record file, updates the obtained state of each delay record into the read state, analyzes each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, generates an alarm instruction if delay records with abnormal delay exist, and sends the alarm instruction to a preset alarm module, so that the alarm module carries out delay alarm.
The method, optionally, further comprises:
the monitoring module stores the obtained delay records to a preset data storage platform;
And calling a preset visualization component to process each delay record in the data storage platform, so that the visualization component displays each delay record in the data storage platform.
The method, optionally, the determining the overall delay time of the flank application based on each flank task includes:
Determining a key task in each Flink task;
Determining a task horizon time for the critical task, and determining a current time for a node executing the critical task;
And determining a first delay time of the key task based on the current time and the task horizontal line time, and taking the first delay time as the whole delay time of the Flink application.
The method, optionally, the determining the overall delay time of the flank application based on each flank task includes:
For each Flink task, determining a task horizontal line time of the Flink task and a current time of a node executing the Flink task, and calculating a second delay time of the Flink task based on the task horizontal line time and the current time;
and carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the integral delay time of the Flink application.
The method, optionally, the generating a delay record corresponding to the overall delay time includes:
collecting node information of the nodes for executing the key tasks;
and filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the integral delay time.
The method, optionally, the generating a delay record corresponding to the overall delay time includes:
for each Flink task, determining node information of a node executing the Flink task;
and writing the average delay time, the node information of each Flink task and the second delay time into a preset second record template, and generating a delay record corresponding to the integral delay time.
A monitoring device for a streaming computing engine task, comprising:
the first acquisition unit is used for periodically acquiring the application identifier of the Flink application according to a preset first time step;
The second acquisition unit is used for acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification;
the determining unit is used for determining the Flink task corresponding to each task identifier and determining the overall delay time of the Flink application based on each Flink task;
the generating unit is used for generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
And the alarm unit is used for periodically calling the monitoring module to read the record file according to a preset second time step, so that the monitoring module obtains each delay record with the unread state in the record file, updates the state of each obtained delay record into the read state, analyzes each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, generates an alarm instruction if delay records with abnormal delay exist, and sends the alarm instruction to the preset alarm module, so that the alarm module carries out delay alarm.
The above device, optionally, further comprises:
the storage unit is used for storing the obtained delay records to a preset data storage platform by the monitoring module;
And the calling unit is used for calling a preset visualization component to process each delay record in the data storage platform, so that the visualization component displays each delay record in the data storage platform.
The above apparatus, optionally, the determining unit includes:
the first determining subunit is used for determining a key task in each Flink task;
A second determining subunit, configured to determine a task horizontal line time of the critical task, and determine a current time of a node that performs the critical task;
And the third determining subunit is used for determining the first delay time of the critical task based on the current time and the task horizontal line time, and taking the first delay time as the overall delay time of the Flink application.
The above apparatus, optionally, the determining unit includes:
A fourth determining subunit, configured to determine, for each of the link tasks, a task horizontal line time of the link task and a current time of a node that executes the link task, and calculate a second delay time of the link task based on the task horizontal line time and the current time;
And the operation subunit is used for carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the integral delay time of the Flink application.
The above apparatus, optionally, the generating unit includes:
The acquisition subunit is used for acquiring node information of the nodes for executing the key tasks;
And the obtaining subunit is used for filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the integral delay time.
The above apparatus, optionally, the generating unit includes:
A fifth determining subunit, configured to determine, for each of the link tasks, node information of a node that executes the link task;
And the generation subunit is used for writing the average delay time, the node information of each flank task and the second delay time into a preset second record template, and generating a delay record corresponding to the integral delay time.
Compared with the prior art, the invention has the following advantages:
The invention provides a method and a device for monitoring a streaming engine task, wherein the method comprises the following steps: periodically acquiring an application identifier of the Flink application according to a first time step; acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification; determining a Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task; generating a delay record corresponding to the overall delay time, marking the state of the delay record as an unread state, and writing the state of the delay record into a record file; according to the second time step, the monitoring module is periodically called to read delay records with unread states in the record file, whether delay records with abnormal delay exist in the read delay records or not is judged, and if the delay records exist, a generated alarm instruction is sent to the alarm module, so that the alarm module carries out delay alarm. Generating a delay record of the whole delay time of the Flink application, monitoring whether an abnormal delay record exists in the delay record, and if so, timely alarming to a worker so as to enable the worker to timely solve the problem of abnormal delay.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for monitoring tasks of a streaming computing engine according to an embodiment of the present invention;
FIG. 2 is a flowchart of another method for monitoring tasks of a streaming computing engine according to an embodiment of the present invention;
FIG. 3 is a flowchart of another method for monitoring tasks of a streaming computing engine according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a monitoring device for tasks of a streaming computing engine according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In order to solve the problem that an application constructed by using a flexible framework is easy to have long-time delay of a task and can not be timely solved when running, the invention provides a method for monitoring a task of a streaming computing engine, the method can be applied to a monitoring platform, an execution subject of the method can be a processor or a server in the monitoring platform, and referring to fig. 1, a method flow chart of the method for monitoring the task of the streaming computing engine provided by the embodiment of the invention is provided, and related description is as follows:
S101, periodically acquiring application identifiers of the Flink application according to a preset first time step.
In the method provided by the embodiment of the invention, the monitoring platform periodically acquires the application identifier of the Flink application according to the preset first time step, wherein the first time step can be set based on actual requirements, specifically, for example, 2 minutes, the application identifier of the Flink application is acquired every 2 minutes; further, the application identifier is a unique identity identifier of a flank application, which is a flank computing application program running on the yan cluster.
S102, acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification.
Based on the application identifier, a task identifier list of the Flink application is obtained, the task identifier list has a list identifier, the list identifier is an identification identifier of the task identifier list, wherein the list identifier is associated with the application identifier of the Flink application, and the task identifier list corresponding to the list identifier consistent with the application identifier can be used as the task identifier list of the Flink application.
The task identification list comprises at least one task identification, wherein the task identification is the unique identification of the Flink task of the Flink application.
S103, determining the Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task.
Determining a Flink task corresponding to each task identifier, wherein the Flink task is a task or process which the Flink application needs to execute in the running process; based on each Flink task, determining the overall delay time of the Flink application, where the overall delay time is a time length, specifically, for example, 5 minutes, 30 seconds, and the like, where the overall delay time may be used to characterize the delay condition of the Flink application in the executing process, and the overall delay time may be used to subsequently determine whether the delay of the Flink application in the executing process is abnormal.
S104, generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file.
And generating a corresponding delay record for the whole delay time, wherein the delay record is used for recording the running condition of the Flink application in the current first time step, marking the delay record as an unread state, and writing the marked delay record into a preset record file, wherein the delay record is marked as the unread state and is used for indicating that the delay record has not been read.
The record file is used for storing and recording the delay record of the running condition of the Flink application, and the record file in the invention is explained, wherein the delay record in the record file can be only the delay record in the unread state, namely the record file does not contain the delay record in the unread state; alternatively, a record file may hold a delayed record of multiple states, such as a delayed record of an unread state and a delayed record of a read state.
S105, periodically calling a monitoring module to read the record file according to a preset second time step, so that the monitoring module obtains each delay record with the unread state in the record file, and updates the obtained state of each delay record into the read state.
In the method provided by the invention, the monitoring platform can also periodically call the monitoring module to read the record file according to the preset second time step, so that the monitoring module can acquire the delay record of the record file with the unread state at regular time, wherein the second time step can be set according to the actual requirement, and preferably, the second time step is longer than or equal to the first time step. After the monitoring module acquires the delay record with the unread state in the record file, the acquired states of the delay records in the record file are updated to the read state.
S106, judging whether delay records with abnormal delay exist in the acquired delay records, and executing S107 if delay records with abnormal delay exist; if there is no delay record of the delay abnormality, S108 is executed.
Analyzing each obtained delay record to determine the overall delay time in each delay record; when judging whether delay records with abnormal delay exist in the obtained delay records, the specific process is as follows:
comparing the overall delay time in each delay record with a preset delay time, and judging whether the overall delay time greater than or equal to the preset delay time exists in the overall delay time;
if the overall delay time greater than or equal to the preset delay time exists in each overall delay time, determining delay records with abnormal delay in each delay record, and determining the delay record to which the overall delay time greater than or equal to the preset delay time belongs as delay records with abnormal delay; at this time, it can be determined that a delay exception exists in the Flink application;
If the integral delay time which is larger than or equal to the preset delay time does not exist in the integral delay time, determining that delay records with abnormal delay do not exist in the delay records, and determining that the delay records are delay records with normal delay; at this point it may be determined that there is no delay exception for the flank application.
The preset delay time in the present invention is a time length, for example, 2 minutes or 1 minute, and the preset delay time can be set according to actual requirements.
S107, generating an alarm instruction and sending the alarm instruction to a preset alarm module so that the alarm module carries out delay alarm.
When delay records with abnormal delay exist in each delay record, an alarm instruction corresponding to the delay records with abnormal delay is generated, wherein the alarm instruction contains information of the delay records with abnormal delay, and the alarm instruction is sent to an alarm module, so that the alarm module carries out delay alarm, and various alarm modes are carried out by the alarm module, such as sending alarm short messages, alarm mails or sending alarm sounds. When the alarm module carries out delay alarm, delay abnormality alarm is carried out actually, so that staff is informed of delay abnormality of the Flink application in time, and the staff can solve the problem of delay abnormality of the Flink application in time.
S108, generating a monitoring normal record, and storing the monitoring normal record.
And generating a monitoring normal record based on the record information of each delayed record with normal delay, and storing the monitoring normal record for subsequent checking.
In the method provided by the embodiment of the invention, the monitoring module can store each acquired delay record into a preset data storage platform, wherein the data storage platform comprises, but is not limited to, a database and an ES (ELASTIC SEARCH, an open source distributed search and data analysis engine), and the ES is used for storing, retrieving and analyzing data.
After the delay records are stored in the data storage platform by the monitoring module, a preset visualization component can be called to process each delay record in the data storage platform, so that each delay record in the data storage platform is displayed by the visualization component, and when the delay records are displayed by the visualization component, the delay time in each delay record can be drawn according to the time dimension, and then the whole delay condition is displayed.
In the method provided by the embodiment of the invention, the application identifier of the Flink application is periodically acquired according to the first time step; acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification; determining a Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task; generating a delay record corresponding to the overall delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file; according to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record with the unread state in the record file, updates the obtained state of each delay record into the read state, analyzes each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, generates an alarm instruction if delay records with abnormal delay exist, and sends the alarm instruction to a preset alarm module, so that the alarm module carries out delay alarm. Based on each Flink task of the Flink application, the overall delay time of the Flink application is determined, a monitoring module is used for monitoring delay records corresponding to the overall delay time, and when delay records with abnormal delay exist, an alarm module is triggered in time to alarm, so that staff can find problems in time.
Referring to fig. 2, a flowchart of a method for determining an overall delay time of a flank application based on each flank task according to an embodiment of the present invention is specifically described below:
s201, determining a key task in each Flink task.
In the method provided by the embodiment of the invention, a key task is determined in each Flink task, wherein the key task is the last Flink task executed in each Flink task, and when the key task is determined, the execution time of each Flink task can be acquired first, wherein the execution time is the time of the Flink task executed by the node of the corresponding Yarn cluster; and determining the latest execution time, and determining the Flink task corresponding to the latest execution time as a key task.
S202, determining task horizontal line time of the key task, and determining current time of a node executing the key task.
Acquiring task horizontal line time of a key task, wherein the task horizontal line time is a time stamp; and determining the current time of the node for executing the critical task, wherein the current time of the node is the system time of the node.
S203, determining a first delay time of the key task based on the current time and the task horizontal line time, and taking the first delay time as the whole delay time of the Flink application.
Calculating a first delay time of a key task based on the current time and the task horizontal line time, and taking the first delay time as an overall delay time of a Flink application, specifically, subtracting the task horizontal line time from the current time when calculating the first delay time of the key task to obtain the first delay time, wherein the first delay time in the invention is a time length, such as 1 minute, 2 minutes or 52 seconds; the first delay time may be negative, if the first delay time is negative, indicating that the critical task is not delayed when executed by the node; if the first delay time is a positive number, this indicates that the critical task is delayed when executed by the node.
In the method provided by the embodiment of the invention, the overall delay time of the Flink application is determined based on the task time line of the key task and the current time for executing the key task, so that the delay state of the Flink application in the running process can be accurately obtained, and whether the delay abnormality of the Flink application occurs or not can be determined through the overall delay time.
After determining the overall delay time of the flank application, a delay record corresponding to the overall delay time needs to be generated, which is specifically described as follows: collecting node information of the nodes for executing the key tasks; and filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the integral delay time. The first recording module in the invention specifically comprises: < sequence number > < node information > < first delay time >; the sequence number in the record template is the sequence number allocated for the delay record when the delay record is generated, and the sequence number has uniqueness. By using the first record template, the delay record corresponding to the whole delay time can be quickly generated, so that the efficiency of monitoring the Flink application is improved, and the process of monitoring the Flink application is simplified.
Referring to fig. 3, a flowchart of another method for determining an overall delay time of a link application function based on each link task according to an embodiment of the present invention is specifically described below:
s301, for each Flink task, determining task horizontal line time of the Flink task and current time of a node executing the Flink task, and calculating second delay time of the Flink task based on the task horizontal line time and the current time.
The description of the second delay time in the present invention may refer to the description of the first delay time in fig. 2, and the calculation process of the second delay time is the same as the calculation process of the first delay time, and the description of the second delay time is not repeated here.
S302, carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the integral delay time of the Flink application.
One way to perform a weighted average operation on each second delay time may be: and carrying out summation operation on each second delay time, dividing the obtained time value by the number of the second delay times to obtain average delay time, and taking the average delay time as the whole delay time of the Flink application, wherein the average delay time is a time length, such as 1 minute, 2 minutes or 40 seconds.
And performing weighted average operation on the second delay time of each Flink task, and taking the obtained average delay time as the overall delay time of the Flink application, so that the overall delay time of the Flink application is more representative and universal, and the overall delay time of the Flink application is more representative.
After determining the overall delay time of the flank application, the delay record corresponding to the overall delay time is also required to be generated. For each Flink task, determining node information of a node executing the Flink task; and writing the average delay time, the node information of each Flink task and the second delay time into a preset second record template, and generating a delay record corresponding to the integral delay time.
And acquiring node information of the node corresponding to each Flink task, and writing the average delay time, the node information of each Flink task and the second delay time into a preset second record template, so as to obtain a delay record corresponding to the whole delay time.
The second record template may be specifically:
< delay record number > < average delay time >;
< task number > < node information > < second delay time >;
Further, the < delayed record number > in the second record template is a number for recording a delayed record, which is a number assigned to the delayed record when the delayed record is generated, the number having uniqueness; wherein < task number > < node information > < second latency > is used to record for one flank task, wherein the task number may be the execution order of the flank task or the task number of the flank task, which is unique to the flank task, and there are a plurality of < task number > < node information > < second latency > < in the latency record generated using the second record template.
In the method provided by the embodiment of the invention, the data recorded by the delay record generated by using the second record template is more detailed, so that reliable and accurate data support is provided for the follow-up monitoring of the Flink application, and the monitoring of the Flink application is more accurate.
Corresponding to fig. 1, the embodiment of the invention also provides a device for monitoring the task of the streaming computing engine, which is applied to a monitoring platform and is used for supporting the application of the method for monitoring the task of the streaming computing engine provided by the embodiment of the invention in reality. The schematic structural diagram of the device provided by the embodiment of the invention is shown in fig. 4, and the specific description is as follows:
a first obtaining unit 401, configured to periodically obtain an application identifier of the Flink application according to a preset first time step;
A second obtaining unit 402, configured to obtain a task identifier list corresponding to the application identifier, where the task identifier list includes at least one task identifier;
A determining unit 403, configured to determine a link task corresponding to each task identifier, and determine an overall delay time of the link application based on each link task;
A generating unit 404, configured to generate a delay record corresponding to the overall delay time, mark a state of the delay record as an unread state, and write the marked delay record into a preset record file;
And the alarm unit 405 is configured to periodically invoke the monitoring module to read the record file according to a preset second time step, so that the monitoring module obtains each delay record in the record file with an unread state, updates the obtained state of each delay record to a read state, and analyzes each obtained delay record to determine whether delay records with abnormal delay exist in each delay record, and if delay records with abnormal delay exist, generate an alarm instruction, and send the alarm instruction to the preset alarm module, so that the alarm module performs delay alarm.
In the device provided by the embodiment of the invention, the application identifier of the Flink application is periodically acquired according to the first time step; acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification; determining a Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task; generating a delay record corresponding to the overall delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file; according to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record with the unread state in the record file, updates the obtained state of each delay record into the read state, analyzes each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, generates an alarm instruction if delay records with abnormal delay exist, and sends the alarm instruction to a preset alarm module, so that the alarm module carries out delay alarm. Based on each Flink task of the Flink application, the overall delay time of the Flink application is determined, a monitoring module is used for monitoring delay records corresponding to the overall delay time, and when delay records with delay abnormality exist, an alarm module is triggered in time to alarm, so that a worker can find and solve the problem of delay abnormality of the Flink application in time.
In the device provided by the embodiment of the invention, the device further comprises:
the storage unit is used for storing the obtained delay records to a preset data storage platform by the monitoring module;
And the calling unit is used for calling a preset visualization component to process each delay record in the data storage platform, so that the visualization component displays each delay record in the data storage platform.
The apparatus provided by the embodiment of the present invention, the determining unit 403 may be configured to:
the first determining subunit is used for determining a key task in each Flink task;
A second determining subunit, configured to determine a task horizontal line time of the critical task, and determine a current time of a node that performs the critical task;
And the third determining subunit is used for determining the first delay time of the critical task based on the current time and the task horizontal line time, and taking the first delay time as the overall delay time of the Flink application.
The apparatus provided by the embodiment of the present invention, the determining unit 403 may be configured to:
A fourth determining subunit, configured to determine, for each of the link tasks, a task horizontal line time of the link task and a current time of a node that executes the link task, and calculate a second delay time of the link task based on the task horizontal line time and the current time;
And the operation subunit is used for carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the integral delay time of the Flink application.
The generating unit 404 provided by the embodiment of the present invention may be configured to:
The acquisition subunit is used for acquiring node information of the nodes for executing the key tasks;
And the obtaining subunit is used for filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the integral delay time.
The generating unit 404 provided by the embodiment of the present invention may be configured to:
A fifth determining subunit, configured to determine, for each of the link tasks, node information of a node that executes the link task;
And the generation subunit is used for writing the average delay time, the node information of each flank task and the second delay time into a preset second record template, and generating a delay record corresponding to the integral delay time.
The embodiment of the invention also provides a storage medium, which comprises stored instructions, wherein the instructions are used for controlling equipment where the storage medium is located to execute the monitoring method of the streaming calculation engine task when running.
The embodiment of the present invention further provides an electronic device, whose structural schematic diagram is shown in fig. 5, specifically including a memory 501, and one or more instructions 502, where the one or more instructions 502 are stored in the memory 501, and configured to be executed by the one or more processors 503, where the one or more instructions 502 perform the following operations:
Periodically acquiring an application identifier of the Flink application according to a preset first time step;
acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification;
Determining a Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task;
generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
According to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record with the unread state in the record file, updates the obtained state of each delay record into the read state, analyzes each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, generates an alarm instruction if delay records with abnormal delay exist, and sends the alarm instruction to a preset alarm module, so that the alarm module carries out delay alarm.
The specific implementation process and derivative manner of the above embodiments are all within the protection scope of the present invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A method for monitoring a task of a streaming computing engine, comprising:
Periodically acquiring an application identifier of the Flink application according to a preset first time step;
acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification;
Determining a Flink task corresponding to each task identifier, and determining the overall delay time of the Flink application based on each Flink task;
generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
According to a preset second time step, periodically calling a monitoring module to read the record file, so that the monitoring module obtains each delay record with an unread state in the record file, updates the obtained state of each delay record into a read state, analyzes each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, generates an alarm instruction if delay records with abnormal delay exist, and sends the alarm instruction to a preset alarm module, so that the alarm module carries out delay alarm;
The determining the overall delay time of the Flink application based on each Flink task comprises the following steps:
Determining a key task in each Flink task;
Determining a task horizon time for the critical task, and determining a current time for a node executing the critical task;
And determining a first delay time of the key task based on the current time and the task horizontal line time, and taking the first delay time as the whole delay time of the Flink application.
2. The method as recited in claim 1, further comprising:
the monitoring module stores the obtained delay records to a preset data storage platform;
And calling a preset visualization component to process each delay record in the data storage platform, so that the visualization component displays each delay record in the data storage platform.
3. The method of claim 1, wherein the determining the overall latency of the Flink application based on each of the Flink tasks comprises:
For each Flink task, determining a task horizontal line time of the Flink task and a current time of a node executing the Flink task, and calculating a second delay time of the Flink task based on the task horizontal line time and the current time;
and carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the integral delay time of the Flink application.
4. The method of claim 1, wherein the generating a delay record corresponding to the overall delay time comprises:
collecting node information of the nodes for executing the key tasks;
and filling the node information and the first delay time into a preset first record template to obtain a delay record corresponding to the integral delay time.
5. A method according to claim 3, wherein said generating a delay record corresponding to an overall delay time comprises:
for each Flink task, determining node information of a node executing the Flink task;
and writing the average delay time, the node information of each Flink task and the second delay time into a preset second record template, and generating a delay record corresponding to the integral delay time.
6. A monitoring device for a streaming computing engine task, comprising:
the first acquisition unit is used for periodically acquiring the application identifier of the Flink application according to a preset first time step;
The second acquisition unit is used for acquiring a task identification list corresponding to the application identification, wherein the task identification list comprises at least one task identification;
the determining unit is used for determining the Flink task corresponding to each task identifier and determining the overall delay time of the Flink application based on each Flink task;
the generating unit is used for generating a delay record corresponding to the integral delay time, marking the state of the delay record as an unread state, and writing the marked delay record into a preset record file;
The alarm unit is used for periodically calling a monitoring module to read the record file according to a preset second time step, so that the monitoring module obtains each delay record with the unread state in the record file, updates the state of each obtained delay record into the read state, analyzes each obtained delay record to judge whether delay records with abnormal delay exist in each delay record, generates an alarm instruction if delay records with abnormal delay exist, and sends the alarm instruction to the preset alarm module, so that the alarm module carries out delay alarm;
The determination unit includes:
the first determining subunit is used for determining a key task in each Flink task;
A second determining subunit, configured to determine a task horizontal line time of the critical task, and determine a current time of a node that performs the critical task;
And the third determining subunit is used for determining the first delay time of the critical task based on the current time and the task horizontal line time, and taking the first delay time as the overall delay time of the Flink application.
7. The apparatus as recited in claim 6, further comprising:
the storage unit is used for storing the obtained delay records to a preset data storage platform by the monitoring module;
And the calling unit is used for calling a preset visualization component to process each delay record in the data storage platform, so that the visualization component displays each delay record in the data storage platform.
8. The apparatus according to claim 6, wherein the determining unit includes:
A fourth determining subunit, configured to determine, for each of the link tasks, a task horizontal line time of the link task and a current time of a node that executes the link task, and calculate a second delay time of the link task based on the task horizontal line time and the current time;
And the operation subunit is used for carrying out weighted average operation on each second delay time to obtain average delay time, and taking the average delay time as the integral delay time of the Flink application.
CN202110639027.5A 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine Active CN113342608B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110639027.5A CN113342608B (en) 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110639027.5A CN113342608B (en) 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine

Publications (2)

Publication Number Publication Date
CN113342608A CN113342608A (en) 2021-09-03
CN113342608B true CN113342608B (en) 2024-06-21

Family

ID=77475406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110639027.5A Active CN113342608B (en) 2021-06-08 2021-06-08 Method and device for monitoring tasks of streaming computing engine

Country Status (1)

Country Link
CN (1) CN113342608B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115328974B (en) * 2022-10-12 2022-12-13 南斗六星***集成有限公司 Data real-time detection method, device, equipment and readable storage medium
CN117408595B (en) * 2023-12-11 2024-04-30 上海文景信息科技有限公司 Block chain-based multi-mode intermodal whole-course quality control method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522719A (en) * 2020-04-27 2020-08-11 中国银行股份有限公司 Method and device for monitoring big data task state
CN111881011A (en) * 2020-07-31 2020-11-03 网易(杭州)网络有限公司 Log management method, platform, server and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130081001A1 (en) * 2011-09-23 2013-03-28 Microsoft Corporation Immediate delay tracker tool
CN109766198B (en) * 2018-12-28 2023-07-11 深圳前海微众银行股份有限公司 Stream processing method, device, equipment and computer readable storage medium
CN110532152A (en) * 2019-08-05 2019-12-03 北明云智(武汉)网软有限公司 A kind of monitoring alarm processing method and system based on Kapacitor computing engines
CN112767080A (en) * 2021-01-19 2021-05-07 上海微盟企业发展有限公司 Alarming method, device and medium based on stream type calculation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522719A (en) * 2020-04-27 2020-08-11 中国银行股份有限公司 Method and device for monitoring big data task state
CN111881011A (en) * 2020-07-31 2020-11-03 网易(杭州)网络有限公司 Log management method, platform, server and storage medium

Also Published As

Publication number Publication date
CN113342608A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
CN110888783B (en) Method and device for monitoring micro-service system and electronic equipment
CN113238913B (en) Intelligent pushing method, device, equipment and storage medium for server faults
US7409316B1 (en) Method for performance monitoring and modeling
CN113342608B (en) Method and device for monitoring tasks of streaming computing engine
CN110309130A (en) A kind of method and device for host performance monitor
CN106940677A (en) One kind application daily record data alarm method and device
CN113190423B (en) Method, device and system for monitoring service data
CN112311617A (en) Configured data monitoring and alarming method and system
CN111026621B (en) Monitoring alarm method, device, equipment and medium for Elasticissearch cluster
CN101252462B (en) Alarming page furbishing method as well as server and client end
CN112636979B (en) Cluster alarm method and related device
CN114356499A (en) Kubernetes cluster alarm root cause analysis method and device
CN113220534A (en) Cluster multi-dimensional anomaly monitoring method, device, equipment and storage medium
CN106911519A (en) A kind of data acquisition monitoring method and device
CN110674149B (en) Service data processing method and device, computer equipment and storage medium
CN110363381B (en) Information processing method and device
CN111327466A (en) Alarm analysis method, system, equipment and medium
CN112685247B (en) Alarm suppression method based on Zabbix monitoring system and monitoring system
JP2008108154A (en) Management system for operation performance information
CN114116128B (en) Container instance fault diagnosis method, device, equipment and storage medium
CN112732517B (en) Disk fault alarm method, device, equipment and readable storage medium
CN112732531A (en) Monitoring data processing method and device
CN113254313A (en) Monitoring index abnormality detection method and device, electronic equipment and storage medium
CN115118575B (en) Monitoring method, monitoring device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant