CN117271185A - Method and device for detecting operation and computer storage medium - Google Patents

Method and device for detecting operation and computer storage medium Download PDF

Info

Publication number
CN117271185A
CN117271185A CN202311217071.2A CN202311217071A CN117271185A CN 117271185 A CN117271185 A CN 117271185A CN 202311217071 A CN202311217071 A CN 202311217071A CN 117271185 A CN117271185 A CN 117271185A
Authority
CN
China
Prior art keywords
job
alarm
sub
model
interrupt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311217071.2A
Other languages
Chinese (zh)
Inventor
李小莉
郭锦帅
陈颖琪
韦英浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202311217071.2A priority Critical patent/CN117271185A/en
Publication of CN117271185A publication Critical patent/CN117271185A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a detection method and device for an operation and a computer storage medium, which are applied to the field of cloud computing, big data or finance. The method comprises the following steps: for each sub-job, collecting operation parameters of the sub-job; when the number of failure records in the operation parameters exceeds a work error threshold, inputting an error log in the operation parameters into an alarm model, wherein the alarm model is constructed based on a historical sample data set; processing the error log based on the alarm model, and outputting a judging result; and if the judging result is an interrupt mechanism, interrupting the sub-job based on the interrupt mechanism. In the embodiment of the invention, in the process of monitoring the execution of batch jobs, whether the sub-jobs are interrupted or not is identified by an alarm model for each job, so that the problem of improving the speed of job processing is solved.

Description

Method and device for detecting operation and computer storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for detecting a job, and a computer storage medium.
Background
At present, a plurality of batch jobs are operated in a serial or parallel mode, the number of times that data in each job is wrong is recorded in the batch job processing process, next data is continuously executed, and when the recorded number of times exceeds a wrong threshold value, sub-jobs and batch jobs are simultaneously interrupted. The method only carries out one-cut interruption in the mode, and also needs to judge whether batch operation needs to be continuously executed or not in a mode of waiting for manual processing, so that the speed of operation processing is influenced.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a method, an apparatus and a computer storage medium for detecting a job, so as to solve the problem in the prior art that the speed of job processing is affected.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
a first aspect of an embodiment of the present invention shows a method for detecting a job, including:
for each sub-job, collecting operation parameters of the sub-job;
when the number of failure records in the operation parameters exceeds a work error threshold, inputting an error log in the operation parameters into an alarm model, wherein the alarm model is constructed based on a historical sample data set;
processing the error log based on the alarm model, and outputting a judging result;
and if the judging result is an interrupt mechanism, interrupting the sub-job based on the interrupt mechanism.
Optionally, the method further comprises:
after interrupting the sub-job based on the interrupt mechanism, determining whether to interrupt the operation of the batch job based on the operation parameters of the sub-job and an operation result, wherein the operation result is determined based on each piece of data in the collected sub-job operation process.
Optionally, the determining whether to interrupt the operation of the batch job based on the operation parameters and the operation results of the sub-job includes:
determining whether the sub-job is an important job or not based on the operation parameters and the operation results of the sub-job;
if yes, determining to interrupt the operation of the batch job;
if not, skipping the sub-operation.
Optionally, the processing the error log based on the alarm model, outputting a determination result, includes:
extracting alarm factors in the error log based on the alarm model, wherein the number of the alarm factors is a plurality of alarm factors;
determining alarm tags to which each alarm factor belongs based on the alarm model, and counting the number of each alarm tag;
and outputting a judging result of the interrupt mechanism based on the alarm model if the number of the interrupt alarm tags is determined to be larger than a preset threshold value.
Optionally, the determining the alarm tag to which each alarm factor belongs includes:
classifying the alarm factors based on the alarm model, and determining the alarm type corresponding to each alarm factor;
and setting an alarm tag corresponding to the alarm type corresponding to each alarm factor based on the alarm model.
Optionally, if the number of other alarm tags is determined to be greater than the preset threshold, displaying the error log to the user based on the decision result which is output by the alarm model and needs other decisions;
acquiring an operation mechanism triggered by a user based on the error log;
and if the triggered operation mechanism is determined to be an interrupt mechanism, interrupting the sub-job operation based on the interrupt mechanism.
Optionally, the process of constructing the alert model based on the historical sample dataset includes:
acquiring a historical sample dataset;
dividing the historical sample data set into a training log set and a test log set;
training the initial model by using the training log set to obtain a trained initial model;
and testing the initial model based on the test log set, and determining the initial model passing the test as an alarm model.
Optionally, the method further comprises:
recording execution condition data of each sub-job;
and if the execution of the batch job is finished, generating a batch job condition report based on the execution condition data of each sub job, and displaying the report.
A second aspect of an embodiment of the present invention shows a job detection apparatus, the apparatus including:
the collecting unit is used for collecting the operation parameters of each sub-job;
the input unit is used for inputting the error log in the operation parameter into an alarm model when the failure record number in the operation parameter exceeds the operation error threshold value, and the alarm model is constructed based on the construction unit;
the alarm model is used for processing the error log and outputting a judging result;
and the processing unit is used for interrupting the sub-job based on the interrupt mechanism if the judging result is the interrupt mechanism.
A third aspect of the embodiment of the present invention shows a computer storage medium, where the storage medium includes a storage program, where the program, when executed, controls a device in which the storage medium is located to execute a detection method of a job as shown in the first aspect of the embodiment of the present invention.
The method and the device for detecting the operation and the computer storage medium provided by the embodiment of the invention comprise the following steps: for each sub-job, collecting operation parameters of the sub-job; when the number of failure records in the operation parameters exceeds a work error threshold, inputting an error log in the operation parameters into an alarm model, wherein the alarm model is constructed based on a historical sample data set; processing the error log based on the alarm model, and outputting a judging result; and if the judging result is an interrupt mechanism, interrupting the sub-job based on the interrupt mechanism. In the embodiment of the invention, in the process of monitoring the execution of batch jobs, whether the sub-jobs are interrupted or not is identified by an alarm model for each job, so that the problem of improving the speed of job processing is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for detecting an operation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a detection process of a job according to an embodiment of the present invention;
FIG. 3 is a flow chart of another method for detecting an operation according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a detection device for an operation according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
It should be noted that the method, the device and the computer storage medium for detecting the operation provided by the invention can be used in the cloud computing field, the big data field or the financial field. The foregoing is merely exemplary, and the application fields of the method and apparatus for detecting a job and the computer storage medium provided by the present invention are not limited.
Referring to fig. 1, a flow chart of a method for detecting a job according to an embodiment of the present invention is shown, where the method includes:
the job detection method shown in the invention is applied to a monitoring system, the batch job is input into a batch server for processing, the batch server needs to detect each sub-job operated by the batch server in the process of executing each sub-job in the batch job, as shown in fig. 2, a log.monitor.sh program is provided with a program of the corresponding method of the following steps S101 to S106 so as to detect each sub-job in real time, if the sub-job is interrupted, a mail or a short message is sent to notify a responsible person in real time.
In fig. 2, a plurality of batch servers are shown, which may be, for example, batch server 1 and batch server 2.
Step S101: for each sub-job, the operating parameters of the sub-job are collected.
In the specific implementation process of step S101, in the batch operation process, that is, in the batch operation process, a timing call instruction or a trigger call instruction before running is received, a pre-built operation condition monitoring module is called up to detect each sub-operation in the batch operation in real time; and collecting operation information of each sub-job aiming at each sub-job.
The operation information includes key information such as the number of job reading records, the number of successful processing records, the number of failure records, and error logs corresponding to the error reasons.
The failure record number refers to the number of errors that occur for each piece of data in the execution job.
The successful processing record number refers to the number of successful execution of each piece of data in the execution job.
Step S102: and judging whether the number of failed records in the operation parameters exceeds a work error threshold, if so, executing the step S103, and if not, returning to continue executing the step S101.
In the specific implementation process of step S102, the number of failed records in the operation parameter is compared with the magnitude of the operation error threshold, if it is determined that the number of failed records in the operation parameter exceeds the operation error threshold, step S103 is executed, and if not, step S101 is continued.
The job error threshold is set by a technician according to experiments or experience in advance, and refers to the maximum number of records allowed to fail in the job processing.
Step S103: and inputting the error log in the operation parameters into an alarm model.
Step S104: and processing the error log based on the alarm model, and outputting a judging result.
In step S104, the alert model is constructed based on the historical sample dataset.
It should be noted that, the process of constructing the alert model based on the historical sample data set includes the following steps:
step S11: a historical sample dataset is obtained.
In the specific implementation process of step S11, firstly, error logs of different sub-jobs are collected, each error log is processed, and corresponding alarm factors, alarm types and alarm labels are determined; determining whether the sub-operation corresponding to the error log is interrupted or not through the alarm tag, namely determining results corresponding to the error log; and taking the error log, the corresponding alarm factor, the alarm type, the alarm tag, the judgment result corresponding to the error log and the like as a historical sample data set.
It should be noted that, the alarm types in the historical sample data set are preset by a technician according to alarm factors, for example, the alarm factors are null pointers, and the corresponding alarm types are code writing errors;
the alarm tags in the historical sample data set are preset according to common characteristics or attributes of different alarm types, for example, interrupt tags refer to alarm types needing to be interrupted, such as alarm types including code writing errors, text verification errors and the like;
skip labels refer to alarm types that can be skipped directly without interruption, such as alarm types that are overtime;
step S12: the historical sample data set is divided into a training log set and a test log set.
In the specific implementation process of step S12, the historical sample data set is divided into a training log set and a test log set according to a preset proportion.
It should be noted that the preset ratio is set by a technician according to practical situations, for example, may be set to be 4:1.
Step S13: training the initial model by using the training log set to obtain a trained initial model;
step S14: and testing the initial model based on the test log set, and determining the initial model passing the test as an alarm model.
In the specific implementation process of the steps S13 to S14, training the training log set by using the initial model to determine the relations between different error logs, alarm factors, alarm types and alarm labels, thereby obtaining a trained initial model; and testing the initial model by using the test log set until the judging result output by the test is consistent with the judging result corresponding to the test log set, determining that the test is passed at the moment, and taking the initial model passed by the test as an alarm model.
Optionally, if the determination result of the test output is inconsistent with the determination result corresponding to the test log set, training the initial model based on the training log set.
Optionally, the alarm model is updated periodically, and a new historical sample dataset is obtained periodically to update the alarm model.
Corresponding evaluation mechanisms are set from the two aspects of refusal (skipping a scene needing to be interrupted) and skip false (executing interruption on the skipped scene), the alarm parameters, the alarm model and related parameters are tracked and evaluated, and the adjustment process is timely carried out.
It should be noted that the initial model may be a general neural network model, a machine learning model, or the like.
In the embodiment of the present invention, the process of step S104 is specifically implemented, including the following steps:
step S21: extracting alarm factors in the error log based on the alarm model, wherein the number of the alarm factors is a plurality of alarm factors;
it should be noted that, the alarm factor refers to some error information in the log, such as null pointer, extra long data length, and no amount of existing fields.
In the specific implementation process of step S21, the alarm model extracts a corresponding alarm factor from the error log.
Step S22: : and determining alarm tags to which each alarm factor belongs based on the alarm model, and counting the number of each alarm tag.
It should be noted that, the process of step S22 is specifically implemented, including the following steps:
step S31: and classifying the alarm factors based on the alarm model, and determining the alarm type corresponding to each alarm factor.
It should be noted that, the alarm model determines the corresponding relationship between the alarm factors and the alarm types through training and testing according to the attribute classification of the alarm factors in advance, that is, different alarm types are set according to the attribute of the alarm factors.
For example, the alarm type corresponding to the empty pointer of the alarm factor is code writing error, and the alarm type corresponding to the overlong text data length of the alarm factor is text checking error.
In the specific implementation process of step S31, the alarm model searches the corresponding relationship between the alarm factors and the alarm types, and determines the alarm type corresponding to each alarm factor.
Step S32: and setting an alarm tag corresponding to the alarm type corresponding to each alarm factor based on the alarm model.
It should be noted that, different alarm tags are set in advance according to the common characteristics or attributes among different alarm types, and the alarm model determines the corresponding relationship between the alarm types and the alarm tags through training and testing.
It should be further noted that the alert tag includes an interrupt tag, other alert tags, skip tags, and the like.
The interrupt label refers to the alarm category needing interrupt, such as code writing error and text checking error;
skip tags refer to alarm categories that can be skipped directly without interruption, such as timeout;
other alarming labels refer to labels which need to be manually judged whether to be interrupted or not, such as new error reasons which do not appear in a training model.
In the specific implementation step S32, the alarm model searches the corresponding relation between the alarm type and the alarm tag, and determines the alarm type corresponding to the alarm type; and the number of each alarm tag is counted respectively.
Step S23: determining whether the number of the interruption alarm tags is larger than a preset threshold value, if so, executing the step S24, and if not, executing the step S25;
in the specific implementation process of step S23, it is determined whether the number of interrupt alert tags is greater than a preset threshold, if it is determined that the number of interrupt alert tags is greater than the preset threshold, step S24 is executed, and if not, step S25 is executed.
It should be noted that the preset threshold is set by the skilled person according to a plurality of experiments or experience.
Step S24: and outputting a judging result of an interrupt mechanism based on the alarm model.
Step S25: determining whether the number of skipped alarm tags is greater than a preset threshold, if so, executing step S26, otherwise, executing step S27.
It should be noted that the implementation process of the implementation step S25 is the same as the implementation process of the step S23 described above, and reference may be made to each other.
Step S26: and outputting an uninterrupted judging result based on the alarm model.
Step S27: and outputting a judging result requiring other judging based on the alarm model.
Optionally, the implementation process based on the step S27 above further includes the following steps after executing the step S27:
step S41: displaying the error log to a user;
in the specific implementation process of step S41, if the error log cannot be determined, the alarm model needs manual assistance, so that the error log is displayed to the user, so that the user can determine whether to interrupt the operation based on the error log.
Step S42: acquiring an operation mechanism triggered by a user based on the error log;
step S43: and judging whether the triggered operation mechanism is an interrupt mechanism, if so, executing step S44, otherwise, indicating that the trigger does not skip operation.
In the specific implementation process from step S42 to step S43, it is acquired and determined whether the operation mechanism triggered by the user is an interrupt mechanism, if it is determined that the triggered operation mechanism is an interrupt mechanism, step S44 is executed, and if not, it is indicated that the trigger does not skip the operation.
Step S44: interrupting the sub-job operation based on the interrupt mechanism.
In the specific implementation of the process of step S44, the interrupt of the sub-job operation is performed by the interrupt mechanism.
Step S105: and judging whether the judging result is an interrupt mechanism, if so, executing the step S106, otherwise, returning to the step S101, and detecting the next sub-job.
In the specific implementation process of step 105, it is determined whether the determination result is an interrupt mechanism, if yes, step S106 is executed, if not, step S101 is executed in a return manner, and the next sub-job is detected.
Step S106: interrupting the sub-job based on the interrupt mechanism.
In the specific implementation of the process of step S106, the interrupt of the sub-job operation is performed by the interrupt mechanism.
In the embodiment of the invention, aiming at each sub-job, the operation parameters of the sub-job are collected; when the number of failure records in the operation parameters exceeds a work error threshold, inputting an error log in the operation parameters into an alarm model, wherein the alarm model is constructed based on a historical sample data set; processing the error log based on the alarm model, and outputting a judging result; and if the judging result is an interrupt mechanism, interrupting the sub-job based on the interrupt mechanism. In the invention, in the process of monitoring batch job execution, aiming at each job, whether the sub-job is interrupted is identified through an alarm model, so that the problem of improving the speed of job processing is solved.
Based on the above-mentioned method for detecting a job shown in the embodiment of the present invention, referring to fig. 1 and fig. 3, another method for detecting a job is correspondingly disclosed in the embodiment of the present invention, where the method includes:
step S301: after interrupting the sub-job based on the interrupt mechanism, determining whether to interrupt the operation of the batch job based on the operation parameters and the operation results of the sub-job, if yes, executing step S302, and if not, executing step S303.
In step S301, the operation result is determined based on each piece of data in the collected sub-job operation process.
It should be noted that, in the specific implementation process of step S301, the method includes the following steps:
step S51: determining whether the sub-job is an important job or not based on the operation parameters and the operation results of the sub-job; if yes, go to step S302, if no, go to step S303.
In the specific implementation step S51, executing the data based on each piece of data in the acquired sub-job operation process to obtain an operation result of each piece of data; determining whether the operation result and the operation parameter need to be called by other jobs, if so, determining that the sub-job is an important job, executing step S302, and if not, determining that the sub-job is not an important job, and executing step S303.
Step S302: interrupting the operation of the batch job.
Step S303: skipping the sub-job.
In the specific implementation process of step S303, the sub-job is skipped, and the next sub-job in the batch job is executed until all sub-jobs are executed.
Optionally, the method further comprises the steps of:
step S61: for each sub-job, execution condition data of the sub-job is recorded.
In the process of embodying step S61, after each sub-job is executed, execution condition data of each of the sub-jobs needs to be recorded.
Step S62: and judging whether the execution of the batch job is finished, if so, executing the step S63, and if not, returning to the step S61.
In the specific implementation of step S62, it is determined whether each sub-job in the batch job is executed, if so, step S63 is executed, and if not, step S61 is executed again.
Step S63: and generating a batch job condition report based on the execution condition data of each sub job, and displaying the batch job condition report.
Step S64: integrating the execution condition data of each sub-job, generating a batch job condition report to show the specific batch running condition of the batch job, and sending the batch job to a mail box of a responsible person.
With continued reference to fig. 2, fig. 2 also shows a log monitoring report, specifically, a batch job status report, i.e., a log monitoring report, is generated after each job in the batch job is executed.
Alternatively, the learnable empirical data may remain for the auxiliary operation module to complete the next operation.
In the embodiment of the invention, after the sub-job is interrupted based on the interrupt mechanism, whether to interrupt the operation of the batch job is determined based on the operation parameters and the operation results of the sub-job. In the invention, in the process of monitoring the execution of batch jobs, aiming at each job, whether the sub-job is interrupted or not is identified through an alarm model, and then whether the whole batch job is interrupted or not is judged; thereby improving the speed of the job processing.
Based on the above-mentioned method for detecting a job shown in the embodiment of the present invention, correspondingly, the embodiment of the present invention shows a device for detecting a job, as shown in fig. 4, where the device includes:
an acquisition unit 401, configured to acquire, for each sub-job, an operation parameter of the sub-job;
an input unit 402 configured to input, when it is determined that the number of failed records in the operation parameter exceeds a job error threshold, an error log in the operation parameter into an alarm model 403, the alarm model 403 being constructed based on the construction unit 405;
the alarm model 403 is configured to process the error log and output a determination result;
and the processing unit 404 is configured to interrupt the sub-job based on the interrupt mechanism if the determination result is the interrupt mechanism.
It should be noted that, the specific principle and the execution process of each unit in the detection device for the operation disclosed in the embodiment of the present application are the same as the detection method for the operation shown in the implementation of the present application, and reference may be made to the corresponding parts in the detection method for the operation disclosed in the embodiment of the present application, and no redundant description is given here.
In the embodiment of the invention, aiming at each sub-job, the operation parameters of the sub-job are collected; when the number of failure records in the operation parameters exceeds a work error threshold, inputting an error log in the operation parameters into an alarm model, wherein the alarm model is constructed based on a historical sample data set; processing the error log based on the alarm model, and outputting a judging result; and if the judging result is an interrupt mechanism, interrupting the sub-job based on the interrupt mechanism. In the invention, in the process of monitoring batch job execution, aiming at each job, whether the sub-job is interrupted is identified through an alarm model, so that the problem of improving the speed of job processing is solved.
Optionally, based on the detection device for a job shown in the foregoing embodiment of the present invention, the processing unit 404 is further configured to:
after interrupting the sub-job based on the interrupt mechanism, determining whether to interrupt the operation of the batch job based on the operation parameters of the sub-job and an operation result, wherein the operation result is determined based on each piece of data in the collected sub-job operation process.
Optionally, based on the detection device of a job shown in the foregoing embodiment of the present invention, the processing unit 404 configured to determine, based on an operation parameter and an operation result of the sub-job, whether to interrupt operation of the batch job is specifically configured to:
determining whether the sub-job is an important job or not based on the operation parameters and the operation results of the sub-job;
if yes, determining to interrupt the operation of the batch job.
Optionally, based on the detection device for a job shown in the foregoing embodiment of the present invention, the alert model 403 is specifically configured to:
extracting alarm factors in the error log based on the alarm model, wherein the number of the alarm factors is a plurality of alarm factors;
determining alarm tags to which each alarm factor belongs based on the alarm model, and counting the number of each alarm tag;
and outputting a judging result of the interrupt mechanism based on the alarm model if the number of the interrupt alarm tags is determined to be larger than a preset threshold value.
Optionally, based on the detection device of a job shown in the above embodiment of the present invention, the construction unit 405 is configured to:
acquiring a historical sample dataset;
dividing the historical sample data set into a training log set and a test log set;
training the initial model by using the training log set to obtain a trained initial model;
and testing the initial model based on the test log set, and determining the initial model passing the test as an alarm model.
Optionally, based on the detection apparatus for a job shown in the foregoing embodiment of the present invention, the determining the alarm model 403 of the alarm tag to which each alarm factor belongs is specifically used for:
classifying the alarm factors based on the alarm model, and determining the alarm type corresponding to each alarm factor;
and setting an alarm tag corresponding to the alarm type corresponding to each alarm factor based on the alarm model.
Optionally, based on the detection device for a job shown in the foregoing embodiment of the present invention, the processing unit 404 is further configured to:
if the number of other alarm tags is determined to be larger than a preset threshold value, displaying the error log to a user based on the judgment result which is output by the alarm model and needs other judgment;
acquiring an operation mechanism triggered by a user based on the error log;
and if the triggered operation mechanism is determined to be an interrupt mechanism, interrupting the sub-job operation based on the interrupt mechanism.
Optionally, based on the detection device for a job shown in the foregoing embodiment of the present invention, the processing unit 404 is further configured to:
recording execution condition data of each sub-job;
and if the execution of the batch job is finished, generating a batch job condition report based on the execution condition data of each sub job, and displaying the report.
Based on the detection device for the operation disclosed in the embodiment of the present disclosure, each module may be implemented by a hardware device configured by a processor and a memory. Specifically, the above modules are stored in a memory as program units, and the processor executes the program units stored in the memory to realize the generation of job detection.
The processor comprises a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more, and the generation of job detection is realized by adjusting kernel parameters.
The embodiment of the disclosure provides a computer storage medium including a stored text processing program, wherein the program when executed by a processor implements the method of detecting a job described in fig. 1 and 3.
The embodiment of the disclosure provides a processor for running a program, wherein the program runs to execute the detection method of the job described in fig. 1 and 3.
The embodiment of the disclosure provides an electronic device, which may be a server, a PC, a PAD, a mobile phone, etc.
The electronic device includes at least one processor, and at least one memory coupled to the processor, and a bus.
The processor and the memory complete the communication with each other through the bus. And a processor for executing the pipeline generation program stored in the memory.
And a memory for storing a program.
The present application also provides a computer program product adapted to perform a program initialized with the above-mentioned method steps when executed on an electronic device.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, the device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flashRAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transshipment) such as modulated data signals and carrier waves.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of job detection, the method comprising:
for each sub-job, collecting operation parameters of the sub-job;
when the number of failure records in the operation parameters exceeds a work error threshold, inputting an error log in the operation parameters into an alarm model, wherein the alarm model is constructed based on a historical sample data set;
processing the error log based on the alarm model, and outputting a judging result;
and if the judging result is an interrupt mechanism, interrupting the sub-job based on the interrupt mechanism.
2. The method as recited in claim 1, further comprising:
after interrupting the sub-job based on the interrupt mechanism, determining whether to interrupt the operation of the batch job based on the operation parameters of the sub-job and an operation result, wherein the operation result is determined based on each piece of data in the collected sub-job operation process.
3. The method of claim 2, wherein the determining whether to interrupt the operation of the batch job based on the operation parameters and the operation results of the sub-job comprises:
determining whether the sub-job is an important job or not based on the operation parameters and the operation results of the sub-job;
if yes, determining to interrupt the operation of the batch job;
if not, skipping the sub-operation.
4. The method of claim 1, wherein the processing the error log based on the alert model, outputting a determination result, comprises:
extracting alarm factors in the error log based on the alarm model, wherein the number of the alarm factors is a plurality of alarm factors;
determining alarm tags to which each alarm factor belongs based on the alarm model, and counting the number of each alarm tag;
and outputting a judging result of the interrupt mechanism based on the alarm model if the number of the interrupt alarm tags is determined to be larger than a preset threshold value.
5. The method of claim 4, wherein determining the alert tag to which each alert factor belongs comprises:
classifying the alarm factors based on the alarm model, and determining the alarm type corresponding to each alarm factor;
and setting an alarm tag corresponding to the alarm type corresponding to each alarm factor based on the alarm model.
6. The method as recited in claim 4, further comprising:
if the number of other alarm tags is determined to be larger than a preset threshold value, displaying the error log to a user based on the judgment result which is output by the alarm model and needs other judgment;
acquiring an operation mechanism triggered by a user based on the error log;
and if the triggered operation mechanism is determined to be an interrupt mechanism, interrupting the sub-job operation based on the interrupt mechanism.
7. The method of claim 1, wherein constructing an alert model based on the historical sample dataset comprises:
acquiring a historical sample dataset;
dividing the historical sample data set into a training log set and a test log set;
training the initial model by using the training log set to obtain a trained initial model;
and testing the initial model based on the test log set, and determining the initial model passing the test as an alarm model.
8. The method as recited in claim 1, further comprising:
recording execution condition data of each sub-job;
and if the execution of the batch job is finished, generating a batch job condition report based on the execution condition data of each sub job, and displaying the report.
9. A job detection apparatus, the apparatus comprising:
the collecting unit is used for collecting the operation parameters of each sub-job;
the input unit is used for inputting the error log in the operation parameter into an alarm model when the failure record number in the operation parameter exceeds the operation error threshold value, and the alarm model is constructed based on the construction unit;
the alarm model is used for processing the error log and outputting a judging result;
and the processing unit is used for interrupting the sub-job based on the interrupt mechanism if the judging result is the interrupt mechanism.
10. A computer storage medium, characterized in that the storage medium comprises a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the method of detecting a job according to any one of claims 1-8.
CN202311217071.2A 2023-09-20 2023-09-20 Method and device for detecting operation and computer storage medium Pending CN117271185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311217071.2A CN117271185A (en) 2023-09-20 2023-09-20 Method and device for detecting operation and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311217071.2A CN117271185A (en) 2023-09-20 2023-09-20 Method and device for detecting operation and computer storage medium

Publications (1)

Publication Number Publication Date
CN117271185A true CN117271185A (en) 2023-12-22

Family

ID=89220833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311217071.2A Pending CN117271185A (en) 2023-09-20 2023-09-20 Method and device for detecting operation and computer storage medium

Country Status (1)

Country Link
CN (1) CN117271185A (en)

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
EP3425524A1 (en) Cloud platform-based client application data calculation method and device
CN109284269B (en) Abnormal log analysis method and device, storage medium and server
CN110888783B (en) Method and device for monitoring micro-service system and electronic equipment
CN106897178B (en) Slow disk detection method and system based on extreme learning machine
AU2019275633B2 (en) System and method of automated fault correction in a network environment
US20200073738A1 (en) Error incident fingerprinting with unique static identifiers
CN113641526B (en) Alarm root cause positioning method and device, electronic equipment and computer storage medium
EP3470988A1 (en) Method for replicating production behaviours in a development environment
CN111767957A (en) Method and device for detecting log abnormity, storage medium and electronic equipment
CN114048099A (en) Java application monitoring method and device, storage medium and electronic equipment
CN112131078B (en) Method and equipment for monitoring disk capacity
CN112100035A (en) Page abnormity detection method, system and related device
CN117271185A (en) Method and device for detecting operation and computer storage medium
CN116881100A (en) Log detection method, log alarm method, system, equipment and storage medium
CN111737158B (en) Abnormal assertion processing method and device, electronic equipment and storage medium
CN115168171A (en) Webpage exception handling method and device, electronic equipment and medium
WO2019125491A1 (en) Application behavior identification
CN108234196B (en) Fault detection method and device
WO2022015313A1 (en) Generation of alerts of correlated time-series behavior of environments
CN112530505A (en) Hard disk delay detection method and device and computer readable storage medium
CN113138872A (en) Abnormal processing device and method for database system
US20230075065A1 (en) Passive inferencing of signal following in multivariate anomaly detection
Khan Time-Series Trend-Based Multi-Level Adaptive Execution Tracing
CN117407293A (en) Method, system, device, storage medium and electronic equipment for detecting target program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination