CN109542737A - Platform alert processing method, device, electronic device and storage medium - Google Patents

Platform alert processing method, device, electronic device and storage medium Download PDF

Info

Publication number
CN109542737A
CN109542737A CN201811151626.7A CN201811151626A CN109542737A CN 109542737 A CN109542737 A CN 109542737A CN 201811151626 A CN201811151626 A CN 201811151626A CN 109542737 A CN109542737 A CN 109542737A
Authority
CN
China
Prior art keywords
alarm
error
platform
big data
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811151626.7A
Other languages
Chinese (zh)
Inventor
范亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201811151626.7A priority Critical patent/CN109542737A/en
Publication of CN109542737A publication Critical patent/CN109542737A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A kind of platform alert processing method, comprising: receive at least alarm signal from a big data platform;The corresponding error log of the alarm signal is obtained, the corresponding alarm classification of the error log is analyzed;Corresponding control instruction is determined according to the alarm classification analyzed;And identified control instruction is sent to the big data platform, the control instruction executes corresponding operation for controlling the big data platform.The present invention also provides a kind of platform alarm treatment device, electronic device and computer readable storage mediums.The present invention can be conducive to improve the efficiency of alarming processing, realize alarming processing automation, improve security monitoring efficiency.

Description

Platform alert processing method, device, electronic device and storage medium
Technical field
The present invention relates to alert processing methods, and in particular to a kind of platform alert processing method, platform alarm treatment device, Electronic device and computer readable storage medium.
Background technique
As the continuous improvement of social informatization technology and Internet technology are quickly popularized, every field is to mass data The demand of processing is also more and more, can come into being to the big data processing platform that mass data is efficiently treated through.Big number It can be regarded as being decided by a variety of serviced components by business according to platform and be combined at the distribution of building with real data process demand Platform.When carrying out data processing according to big data platform, each serviced component in big data platform works independently but each clothes Business inter-module cooperates again, if exception occur in the service processes in some serviced component, it is likely that at entire data Reason process has an impact.It is monitored therefore, it is necessary to the operation to big data platform and is alerted in time when occurring abnormal, thus Guarantee the timeliness and accuracy of data processing.However, it is directed to warning information, warning system processing alarm on the market at present Process automates not enough, and alarming processing efficiency is lower.
Summary of the invention
In view of the foregoing, it is necessary to propose a kind of platform alert processing method, platform alarm treatment device, electronic device And computer readable storage medium, it is able to solve problem above.
A better embodiment of the invention provides a kind of platform alert processing method, comprising: receives flat from a big data An at least alarm signal for platform;The corresponding error log of the alarm signal is obtained, the corresponding alarm of the error log is analyzed Classification;Corresponding control instruction is determined according to the alarm classification analyzed;And identified control instruction is sent to The big data platform, the control instruction execute corresponding operation for controlling the big data platform.
In one possible implementation, the alarm signal includes run-time error or generation in the big data platform The identification information of the service processes of resource problem, the corresponding error log of the alarm signal that obtains includes: described in analysis The identification information of the service processes included in alarm signal;A log is generated according to the identification information that analysis obtains Acquisition request;Log acquisition request is sent to the big data platform, the log acquisition request is described for controlling The error log that corresponding service processes generate is sent to the electronic device by big data platform.
In one possible implementation, the corresponding alarm classification of the analysis error log includes: identification institute It whether states in error log comprising at least one default error-critical word;And working as includes the default mistake in the error log When keyword, the corresponding alarm classification of the default error-critical word is determined according to an error information table, wherein the letter that reports an error Breath table includes multiple default error-critical words and multiple alarm classifications, the corresponding at least one default error-critical of each alarm classification Word.
In one possible implementation, the error log is marked with a flag bit, and the flag bit corresponding one is accused Alert rank, the alarm level are used to indicate when there are multiple error logs, priority processing alarm level higher wrong day Whether will identifies in the error log when the alarm signal received is more than one comprising at least one default error-critical Before word further include: identify the flag bit that each error log is recorded, determine the error log according to the flag bit Alarm level, wherein identify whether the error log includes default error-critical word successively to identify institute according to alarm level State whether error log includes default error-critical word.
In one possible implementation, the alarm classification include at least the first kind alarm, the second class alarm and The alarm of third class, the first kind alarm include the alarm to big data platform environment and resource problem, the second class alarm Including the alarm to mission script problem, the third class alarm includes that the alarm of problem is not completed to day task, and described first Class alerts corresponding first control instruction of classification, and first control instruction is appointed for controlling the big data platform and directly running again Business.Corresponding second control instruction of second class alarm classification, second control instruction is for controlling the big data platform Stopping task.The third class alarm classification corresponds to third control instruction, and the third control instruction is for controlling the big number According to platform suspended task.
A better embodiment of the invention also provides a kind of platform alarm treatment device, comprising: receiving module, for receiving An at least alarm signal from a big data platform, is also used to obtain the corresponding error log of the alarm signal;Analyze mould Block, for analyzing the corresponding alarm classification of the error log;Determining module, for analyzing to obtain according to the analysis module Alarm classification determine corresponding control instruction;And sending module, it is used for control instruction determined by the determining module It is sent to the big data platform, the control instruction executes corresponding operation for controlling the big data platform.
A better embodiment of the invention also provides a kind of electronic device, including processor and memory, the memory In be stored with platform alarming processing program, the processor is realized above-mentioned flat for executing the platform alarming processing program Platform alert processing method.
A better embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable storage medium Platform alarming processing program is stored in matter, the platform alarming processing program is realized above-mentioned described flat when being executed by processor Platform alert processing method.
When the embodiment of the present invention can be for the serviced component run-time error or generation resource problem of the big data platform Alarm analyzed, and alarm cause sort out and corresponding processing strategie is executed according to generic, be conducive to mention The efficiency of high alarming processing realizes alarming processing automation.Furthermore since alarm can be handled in time, be conducive to the big number According to serviced component cooperation interaction operations multiple in platform, avoid a certain serviced component when occurring abnormal to entire data handling procedure It has an impact.
Detailed description of the invention
Fig. 1 is the flow chart for the platform alert processing method that a preferred embodiment of the present invention provides.
Fig. 2 is the structural schematic diagram for the platform alarm treatment device that a preferred embodiment of the present invention provides.
Fig. 3 is the structural schematic diagram for the electronic device that a preferred embodiment of the present invention provides.
Main element symbol description
The present invention that the following detailed description will be further explained with reference to the above drawings.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, embodiments herein and embodiment In feature can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, described embodiment is only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.
Wherein, the electronic device includes memory and processor.It will be understood by those skilled in the art that the present invention is real It applies in example, schematic diagram shown in Fig. 3 is only the example of electronic device, does not constitute the restriction to electronic device, can also be wrapped It includes than illustrating more or fewer components, perhaps combines certain components or different components, such as the electronic device may be used also To include input-output equipment, network access equipment, bus etc..
The electronic device be it is a kind of can according to the instruction for being previously set or store, automatic progress numerical value calculating and/or The equipment of information processing, hardware include but is not limited to microprocessor, specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), number Word processing device (Digital Signal Processor, DSP), embedded device etc..
Specifically, the electronic device include but is not limited to any one can with user by keyboard, mouse, remote controler, The modes such as touch tablet or voice-operated device carry out the electronic product of human-computer interaction, for example, personal computer, tablet computer, intelligent hand Machine, personal digital assistant (Personal Digital Assistant, PDA), Interactive Internet TV (Internet Protocol Television, IPTV) etc..
Fig. 1 is the flow chart for the platform alert processing method that a preferred embodiment of the present invention provides.At the platform alarm Reason method is applied in an electronic device 1.According to different demands, sequence can change the step of the platform alert processing method Become, certain steps can be omitted or merge.The platform alert processing method the following steps are included:
Step S11: at least alarm signal from a big data platform is received;
Wherein, the big data platform can be a distributed platform, and operation has multiple serviced components, and the serviced component can It is made of a host node and at least one from node.For example, for the HDFS on Hadoop distributed platform, host node can be with It is expressed as NameNode, DataNode can be expressed as from node.The host node of the serviced component and can be respectively from node Make a service processes, and the operation of serviced component depends on corresponding service processes, it therefore, can be by monitoring the clothes Be engaged in component service processes operating condition and its physical characteristic (such as the service conditions of the resources such as CPU, memory) realize to institute State the management of serviced component.
Service processes in the serviced component can generate corresponding running log, the running log in the process of running Record has the operation information of the service processes.Wherein, with behavior unit, each row records when having generation the running log respectively Between, log rank, the information such as service processes, class, code position, specific log content for executing program.For operation For the service processes of mistake or generation resource problem, error flag can be carried out in corresponding running log and records generation mistake Details (hereinafter referred to as: error log) accidentally.
For the big data platform when an at least service processes generate error log, Xiang Suoshu electronic device sends the announcement Alert signal.More specifically, the big data platform is carried out by way of cable network or wireless network with the electronic device The alarm signal is sent to the electronic device by network type by connection, the big data platform.
Step S12: obtaining the corresponding error log of the alarm signal, analyzes the corresponding alarm class of the error log Not;
In the present embodiment, the alarm signal includes the mark of the service processes of run-time error or generation resource problem Information.The electronic device after receiving the alarm signal, analyze included in the alarm signal it is described service into The identification information of journey, the identification information obtained according to analysis generate a log acquisition and request and request the log acquisition It is sent to the big data platform, the log acquisition request produces corresponding service processes for controlling the big data platform Raw error log is sent to the electronic device.
In the present embodiment, the electronic device is previously stored with an error information table, and the error information table includes Multiple default error-critical words and multiple alarm classifications, the corresponding at least one default error-critical word of each alarm classification.Wherein, Each alarm classification can generate corresponding error log.Determine the error-critical that the error log of a certain alarm classification is included Word can be obtained according to historical experience.Specifically, the electronic device collects passing error log, arranges each error log The keyword that the details of generation mistake are included recorded in error reason and the error log, to obtain every An at least error-critical word corresponding to one error reason.Then, by classifying to the error reason to determine its institute The alarm classification of category, to obtain an at least error-critical word corresponding to each alarm classification (that is, default error-critical word).
Therefore, in the present embodiment, the corresponding alarm classification of the error log is analyzed in step S12 to specifically include:
Step S121: it whether identifies in the error log comprising at least one default error-critical word;
Wherein, the format for the error log that different service processes generate is different (e.g., can to show as character string or key-value pair Format), therefore, corresponding preset rules can be selected based on the format of the error log, and according to the preset rules into Row information is extracted, to identify whether the error log includes default error-critical word.Such as, when the error log is key-value pair Format, then traverse the error log, and information extraction is carried out to the error log according to predefined key-value pair format. Under normal circumstances, it is separated between the error-critical word in key-value pair and the corresponding value of error-critical word with "=", it therefore, can Error-critical word is extracted by identification "=", and judges whether the error-critical word is default error-critical word.
Step S122: when in the error log including the default error-critical word, according to the error information table Determine the corresponding alarm classification of the default error-critical word.
In the present embodiment, the error log is also marked with a flag bit, the corresponding alarm level of the flag bit, institute It states alarm level to be used to indicate when there are multiple error logs, the higher error log of priority processing alarm level.For example, institute Flag bit is stated including at least the first flag bit, the second flag bit and third flag bit.First flag bit indicates alert level Not Wei level-one, indicate alarm level highest;Second flag bit indicates that alarm level is second level, indicates that alarm level is placed in the middle; The third flag bit indicates that alarm level is three-level, indicates that alarm level is minimum.Therefore, it is more than in the alarm signal received At one, whether step S121 is identified in the error log comprising before at least one default error-critical word further include:
Step S120: identifying the flag bit that each error log is recorded, and determines the wrong day according to the flag bit The alarm level of will;
Wherein, identify the error log whether include default error-critical word be successively identified according to alarm level described in Whether error log includes default error-critical word.
Step S13: corresponding control instruction is determined according to the alarm classification analyzed.
In the present embodiment, a command information table, described instruction information table packet are also previously stored in the electronic device Include the multiple alarm classification and multiple control instructions, the corresponding wherein control instruction of each alarm classification.It therefore, can basis Described instruction information table is according to the corresponding control instruction of each alarm classification of determination.
Step S14: identified control instruction is sent to the big data platform, the control instruction is for controlling institute It states big data platform and executes corresponding operation.
In the present embodiment, the alarm classification includes at least first kind alarm, the alarm of the second class and the alarm of third class, The first kind alarm includes the alarm to big data platform environment and resource problem, and the second class alarm includes to task foot The alarm of this problem, the third class alarm includes that the alarm of problem is not completed to day task.The first kind alerts classification pair The first control instruction is answered, first control instruction directly runs task for controlling the big data platform again.Second class Corresponding second control instruction of classification is alerted, second control instruction stops task for controlling the big data platform.It is described Third class alarm classification corresponds to third control instruction, and the third control instruction is appointed for controlling the big data platform pause Business.
Such as, when the type of service of the big data platform is OLTP, running log record has each serviced component The service condition of the resources such as CPU, number of concurrent index and memory.When the type of service of the big data platform is OLAP, Running log records the service condition for having the resources such as disk I/O, network I/O and the memory of each serviced component.When a certain service When resource occurs for component using problem, the serviced component generates warning information.At this point, first control instruction is for controlling The big data platform corrects resource problem, and controls the service processes and run task again.
Such as, the big data platform can receive the mission script uploaded from a mobile terminal (not shown) by network, It determines corresponding serviced component, when the serviced component meets task schedule condition, the mission script is sent to described In serviced component, so that the serviced component executes corresponding task and returns to implementing result.It is described to mission script problem Alarm is the mission script mistake that the mobile terminal uploads, and causes the serviced component can not be when running the mission script Mistake occurs, and generates warning information.At this point, second control instruction, which controls the serviced component, stops task.
For another example, being executed for the task of a certain serviced component of the big data platform is day task, if the service group Warning information will be generated when being not carried out the day task on the day of part.At this point, the third control command controls the clothes Business component suspended task.Hereafter, the serviced component recycles the duty cycle for starting next round at second day, until described appoint Business is completed.Certainly, in other embodiments, third class alarm may also include to all tasks, moon task, year task it is not complete Problematic alarm.
Fig. 2 is the structural schematic diagram for the platform alarm treatment device 200 that a better embodiment of the invention provides.Some In embodiment, the platform alarm treatment device 200 is run in electronic device.The platform alarm treatment device 200 can be with Including multiple functional modules as composed by program code segments.The journey of each program segment of the platform alarm treatment device 200 Sequence code can store in the memory of electronic device, and as performed by least one described processor, to realize at alarm Manage function.
In the present embodiment, function of the platform alarm treatment device 200 according to performed by it can be divided into multiple Functional module.As shown in Fig. 2, the platform alarm treatment device 200 includes: receiving module 201, analysis module 202, determines mould Block 203 and sending module 204.The so-called module of the present invention refers to that one kind can be performed by least one processor and energy The series of computation machine program segment of fixed function is enough completed, storage is in memory.In the present embodiment, about each module Function will be described in detail in subsequent embodiment.
The receiving module 201 is for receiving at least alarm signal from a big data platform.
Wherein, the big data platform can be a distributed platform, and operation has multiple serviced components, and the serviced component can It is made of a host node and at least one from node.For example, for the HDFS on Hadoop distributed platform, host node can be with It is expressed as NameNode, DataNode can be expressed as from node.The host node of the serviced component and can be respectively from node Make a service processes, and the operation of serviced component depends on corresponding service processes, it therefore, can be by monitoring the clothes Be engaged in component service processes operating condition and its physical characteristic (such as the service conditions of the resources such as CPU, memory) realize to institute State the management of serviced component.
Service processes in the serviced component can generate corresponding running log, the running log in the process of running Record has the operation information of the service processes.Wherein, with behavior unit, each row records when having generation the running log respectively Between, log rank, the information such as service processes, class, code position, specific log content for executing program.For operation For the service processes of mistake or generation resource problem, error flag can be carried out in corresponding running log and records generation mistake Details (hereinafter referred to as: error log) accidentally.
For the big data platform when an at least service processes generate error log, Xiang Suoshu electronic device sends the announcement Alert signal.More specifically, the big data platform is carried out by way of cable network or wireless network with the electronic device The alarm signal is sent to the electronic device by network type by connection, the big data platform.
The receiving module 201 is also used to obtain the corresponding error log of the alarm signal, and the analysis module 202 is used In the corresponding alarm classification of the analysis error log.
In the present embodiment, the alarm signal includes the mark of the service processes of run-time error or generation resource problem Information.After the receiving module 201 receives the alarm signal, the analysis module 202 is analyzed in the alarm signal The identification information for the service processes for being included generates log acquisition request simultaneously according to the identification information that analysis obtains Log acquisition request is sent to the big data platform by above-mentioned sending module 204, the log acquisition request is used The error log that corresponding service processes generate is sent to the electronic device in controlling the big data platform.
In the present embodiment, the electronic device is previously stored with an error information table, and the error information table includes Multiple default error-critical words and multiple alarm classifications, the corresponding at least one default error-critical word of each alarm classification.Wherein, Each alarm classification can generate corresponding error log.Determine the error-critical that the error log of a certain alarm classification is included Word can be obtained according to historical experience.Specifically, the analysis module 202 collects passing error log, arranges each mistake The keyword that the details of mistake are included accidentally is generated recorded in the error reason and the error log of log, from And obtain an at least error-critical word corresponding to each error reason.Then, the analysis module 202 passes through to the mistake Reason classify with determine its belonging to alarm classification, to obtain an at least error-critical corresponding to each alarm classification Word (that is, default error-critical word).
Therefore, in the present embodiment, whether the analysis module 202 identifies default comprising at least one in the error log Error-critical word determines institute according to the error information table when in the error log including the default error-critical word State the corresponding alarm classification of default error-critical word.
Wherein, the format for the error log that different service processes generate is different (e.g., can to show as character string or key-value pair Format), therefore, the analysis module 202 can select corresponding preset rules based on the format of the error log, and according to The preset rules carry out information extraction, to identify whether the error log includes default error-critical word.Such as, when the mistake Accidentally log is the format of key-value pair, then traverses the error log, and according to predefined key-value pair format to the wrong day Will carries out information extraction.Under normal circumstances, "=" is used between the error-critical word in key-value pair and the corresponding value of error-critical word It is separated, therefore, the analysis module 202 can extract error-critical word by identification "=", and judge the mistake Whether keyword is default error-critical word.
In the present embodiment, the error log is also marked with a flag bit, the corresponding alarm level of the flag bit, institute It states alarm level to be used to indicate when there are multiple error logs, the higher error log of priority processing alarm level.For example, institute Flag bit is stated including at least the first flag bit, the second flag bit and third flag bit.First flag bit indicates alert level Not Wei level-one, indicate alarm level highest;Second flag bit indicates that alarm level is second level, indicates that alarm level is placed in the middle; The third flag bit indicates that alarm level is three-level, indicates that alarm level is minimum.Therefore, it is more than in the alarm signal received At one, whether the analysis module 202 is in identifying the error log comprising going back before at least one default error-critical word The flag bit that each error log is recorded for identification determines the alarm level of the error log according to the flag bit.
Wherein, identify the error log whether include default error-critical word be successively identified according to alarm level described in Whether error log includes default error-critical word.
The determining module 203 is corresponding for being determined according to the alarm classification that the analysis module 202 is analyzed Control instruction.
In the present embodiment, a command information table, described instruction information table packet are also previously stored in the electronic device Include the multiple alarm classification and multiple control instructions, the corresponding wherein control instruction of each alarm classification.Therefore, described true Cover half block 203 can be according to described instruction information table according to the corresponding control instruction of each alarm classification of determination.
Control instruction determined by the determining module 203 is sent to the big data platform by the sending module 204, The control instruction executes corresponding operation for controlling the big data platform.
In the present embodiment, the alarm classification includes at least first kind alarm, the alarm of the second class and the alarm of third class, The first kind alarm includes the alarm to big data platform environment and resource problem, and the second class alarm includes to task foot The alarm of this problem, the third class alarm includes that the alarm of problem is not completed to day task.The first kind alerts classification pair The first control instruction is answered, first control instruction directly runs task for controlling the big data platform again.Second class Corresponding second control instruction of classification is alerted, second control instruction stops task for controlling the big data platform.It is described Third class alarm classification corresponds to third control instruction, and the third control instruction is appointed for controlling the big data platform pause Business.
Such as, when the type of service of the big data platform is OLTP, running log record has each serviced component The service condition of the resources such as CPU, number of concurrent index and memory.When the type of service of the big data platform is OLAP, Running log records the service condition for having the resources such as disk I/O, network I/O and the memory of each serviced component.When a certain service When resource occurs for component using problem, the serviced component generates warning information.At this point, first control instruction is for controlling The big data platform corrects resource problem, and controls the service processes and run task again.
Such as, the big data platform can receive the mission script uploaded from a mobile terminal (not shown) by network, It determines corresponding serviced component, when the serviced component meets task schedule condition, the mission script is sent to described In serviced component, so that the serviced component executes corresponding task and returns to implementing result.It is described to mission script problem Alarm is the mission script mistake that the mobile terminal uploads, and causes the serviced component can not be when running the mission script Mistake occurs, and generates warning information.At this point, second control instruction, which controls the serviced component, stops task.
For another example, being executed for the task of a certain serviced component of the big data platform is day task, if the service group Warning information will be generated when being not carried out the day task on the day of part.At this point, the third control command controls the clothes Business component suspended task.Hereafter, the serviced component recycles the duty cycle for starting next round at second day, until described appoint Business is completed.Certainly, in other embodiments, third class alarm may also include to all tasks, moon task, year task it is not complete Problematic alarm.
As shown in figure 3, Fig. 3 is the electronics dress for realizing the platform alert processing method in a better embodiment of the invention Set 1 structural schematic diagram.The electronic device 1 includes memory 101, processor 102 and is stored in the memory 101 And the computer program 103 that can be run on the processor 102, such as platform alarming processing program.
The processor 102 realizes platform alert processing method in above-described embodiment when executing the computer program 103 The step of:
Step S11: at least alarm signal from the big data platform is received;
Wherein, the big data platform can be a distributed platform, and operation has multiple serviced components, and the serviced component can It is made of a host node and at least one from node.For example, for the HDFS on Hadoop distributed platform, host node can be with It is expressed as NameNode, DataNode can be expressed as from node.The host node of the serviced component and can be respectively from node Make a service processes, and the operation of serviced component depends on corresponding service processes, it therefore, can be by monitoring the clothes Be engaged in component service processes operating condition and its physical characteristic (such as the service conditions of the resources such as CPU, memory) realize to institute State the management of serviced component.
Service processes in the serviced component can generate corresponding running log, the running log in the process of running Record has the operation information of the service processes.Wherein, with behavior unit, each row records when having generation the running log respectively Between, log rank, the information such as service processes, class, code position, specific log content for executing program.For operation For the service processes of mistake or generation resource problem, error flag can be carried out in corresponding running log and records generation mistake Details (hereinafter referred to as: error log) accidentally.
For the big data platform when an at least service processes generate error log, Xiang Suoshu electronic device sends the announcement Alert signal.More specifically, the big data platform is carried out by way of cable network or wireless network with the electronic device The alarm signal is sent to the electronic device by network type by connection, the big data platform.
Step S12: obtaining the corresponding error log of the alarm signal, analyzes the corresponding alarm class of the error log Not;
In the present embodiment, the alarm signal includes the mark of the service processes of run-time error or generation resource problem Information.The electronic device after receiving the alarm signal, analyze included in the alarm signal it is described service into The identification information of journey, the identification information obtained according to analysis generate a log acquisition and request and request the log acquisition It is sent to the big data platform, the log acquisition request produces corresponding service processes for controlling the big data platform Raw error log is sent to the electronic device.
In the present embodiment, the electronic device is previously stored with an error information table, and the error information table includes Multiple default error-critical words and multiple alarm classifications, the corresponding at least one default error-critical word of each alarm classification.Wherein, Each alarm classification can generate corresponding error log.Determine the error-critical that the error log of a certain alarm classification is included Word can be obtained according to historical experience.Specifically, the electronic device collects passing error log, arranges each error log The keyword that the details of generation mistake are included recorded in error reason and the error log, to obtain every An at least error-critical word corresponding to one error reason.Then, by classifying to the error reason to determine its institute The alarm classification of category, to obtain an at least error-critical word corresponding to each alarm classification (that is, default error-critical word).
Therefore, in the present embodiment, the corresponding alarm classification of the error log is analyzed in step S12 to specifically include:
Step S121: it whether identifies in the error log comprising at least one default error-critical word;
Wherein, the format for the error log that different service processes generate is different (e.g., can to show as character string or key-value pair Format), therefore, corresponding preset rules can be selected based on the format of the error log, and according to the preset rules into Row information is extracted, to identify whether the error log includes default error-critical word.Such as, when the error log is key-value pair Format, then traverse the error log, and information extraction is carried out to the error log according to predefined key-value pair format. Under normal circumstances, it is separated between the error-critical word in key-value pair and the corresponding value of error-critical word with "=", it therefore, can Error-critical word is extracted by identification "=", and judges whether the error-critical word is default error-critical word.
Step S122: when in the error log including the default error-critical word, according to the error information table Determine the corresponding alarm classification of the default error-critical word.
In the present embodiment, the error log is also marked with a flag bit, the corresponding alarm level of the flag bit, institute It states alarm level to be used to indicate when there are multiple error logs, the higher error log of priority processing alarm level.For example, institute Flag bit is stated including at least the first flag bit, the second flag bit and third flag bit.First flag bit indicates alert level Not Wei level-one, indicate alarm level highest;Second flag bit indicates that alarm level is second level, indicates that alarm level is placed in the middle; The third flag bit indicates that alarm level is three-level, indicates that alarm level is minimum.Therefore, it is more than in the alarm signal received At one, whether step S121 is identified in the error log comprising before at least one default error-critical word further include:
Step S120: identifying the flag bit that each error log is recorded, and determines the wrong day according to the flag bit The alarm level of will;
Wherein, identify the error log whether include default error-critical word be successively identified according to alarm level described in Whether error log includes default error-critical word.
Step S13: corresponding control instruction is determined according to the alarm classification analyzed.
In the present embodiment, a command information table, described instruction information table packet are also previously stored in the electronic device Include the multiple alarm classification and multiple control instructions, the corresponding wherein control instruction of each alarm classification.It therefore, can basis Described instruction information table is according to the corresponding control instruction of each alarm classification of determination.
Step S14: identified control instruction is sent to the big data platform, the control instruction is for controlling institute It states big data platform and executes corresponding operation.
In the present embodiment, the alarm classification includes at least first kind alarm, the alarm of the second class and the alarm of third class, The first kind alarm includes the alarm to big data platform environment and resource problem, and the second class alarm includes to task foot The alarm of this problem, the third class alarm includes that the alarm of problem is not completed to day task.The first kind alerts classification pair The first control instruction is answered, first control instruction directly runs task for controlling the big data platform again.Second class Corresponding second control instruction of classification is alerted, second control instruction stops task for controlling the big data platform.It is described Third class alarm classification corresponds to third control instruction, and the third control instruction is appointed for controlling the big data platform pause Business.
Such as, when the type of service of the big data platform is OLTP, running log record has each serviced component The service condition of the resources such as CPU, number of concurrent index and memory.When the type of service of the big data platform is OLAP, Running log records the service condition for having the resources such as disk I/O, network I/O and the memory of each serviced component.When a certain service When resource occurs for component using problem, the serviced component generates warning information.At this point, first control instruction is for controlling The big data platform corrects resource problem, and controls the service processes and run task again.
Such as, the big data platform can receive the mission script uploaded from a mobile terminal (not shown) by network, It determines corresponding serviced component, when the serviced component meets task schedule condition, the mission script is sent to described In serviced component, so that the serviced component executes corresponding task and returns to implementing result.It is described to mission script problem Alarm is the mission script mistake that the mobile terminal uploads, and causes the serviced component can not be when running the mission script Mistake occurs, and generates warning information.At this point, second control instruction, which controls the serviced component, stops task.
For another example, being executed for the task of a certain serviced component of the big data platform is day task, if the service group Warning information will be generated when being not carried out the day task on the day of part.At this point, the third control command controls the clothes Business component suspended task.Hereafter, the serviced component recycles the duty cycle for starting next round at second day, until described appoint Business is completed.Certainly, in other embodiments, third class alarm may also include to all tasks, moon task, year task it is not complete Problematic alarm.
Alternatively, the processor 102 realizes that above-mentioned platform alarm treatment device is implemented when executing the computer program 103 The function of each module/unit in example, such as the unit 201-204 in Fig. 2.
When the embodiment of the present invention can be for the serviced component run-time error or generation resource problem of the big data platform Alarm analyzed, and alarm cause sort out and corresponding processing strategie is executed according to generic, be conducive to mention The efficiency of high alarming processing realizes alarming processing automation;Furthermore since alarm can be handled in time, be conducive to the big number According to serviced component cooperation interaction operations multiple in platform, avoid a certain serviced component when occurring abnormal to entire data handling procedure It has an impact.
Illustratively, the computer program 103 can be divided into one or more module/units, it is one or Multiple module/the units of person are stored in the memory 101, and are executed by the processor 102, to complete the present invention.Institute Stating one or more module/units can be the series of computation machine program instruction section that can complete specific function, the instruction segment For describing implementation procedure of the computer program 103 in the electronic device 1.For example, the computer program 103 can With acquisition module 301, training module 302, the execution module 303 being divided into Fig. 3.
The electronic device 1 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.It will be understood by those skilled in the art that the schematic diagram is only the example of electronic device 1, do not constitute to electronic device 1 Restriction, may include perhaps combining certain components or different components, such as institute than illustrating more or fewer components Stating electronic device 1 can also include input-output equipment, network access equipment, bus etc..
Alleged processor 102 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor 30 is also possible to any conventional processor Deng the processor 102 is the control centre of the electronic device 1, utilizes various interfaces and the entire electronic device 1 of connection Various pieces.
The memory 101 can be used for storing the computer program 103 and/or module/unit, the processor 102 By running or execute the computer program and/or module/unit that are stored in the memory 101, and calls and be stored in Data in memory 101 realize the various functions of the electronic device 1.The memory 101 can mainly include storage program Area and storage data area, wherein storing program area can application program needed for storage program area, at least one function (such as Sound-playing function, image player function etc.) etc.;Storage data area, which can be stored, uses created number according to electronic device 1 According to (such as audio data, phone directory etc.) etc..In addition, memory 101 may include high-speed random access memory, can also wrap Include nonvolatile memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), peace Digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device, Or other volatile solid-state parts.
If the integrated module/unit of the electronic device 1 is realized in the form of SFU software functional unit and as independent Product when selling or using, can store in a computer readable storage medium.Based on this understanding, the present invention is real All or part of the process in existing above-described embodiment method, can also instruct relevant hardware come complete by computer program At the computer program can be stored in a computer readable storage medium, which is being executed by processor When, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, described Computer program code can be source code form, object identification code form, executable file or certain intermediate forms etc..The meter Calculation machine readable medium may include: can carry the computer program code any entity or device, recording medium, USB flash disk, Mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory Device (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs to illustrate It is that the content that the computer-readable medium includes can be fitted according to the requirement made laws in jurisdiction with patent practice When increase and decrease, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium does not include electric carrier wave letter Number and telecommunication signal.
In several embodiments provided by the present invention, it should be understood that disclosed electronic device and method, Ke Yitong Other modes are crossed to realize.For example, electronics embodiment described above is only schematical, for example, the unit Division, only a kind of logical function partition, there may be another division manner in actual implementation.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in same treatment unit It is that each unit physically exists alone, can also be integrated in same unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case where without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.It is stated in electrical device claims Multiple units or electronic device can also be implemented through software or hardware by the same unit or electronic device.The first, the Second-class word is used to indicate names, and is not indicated any particular order.
Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference Preferred embodiment describes the invention in detail, those skilled in the art should understand that, it can be to of the invention Technical solution is modified or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.

Claims (9)

1. a kind of platform alert processing method characterized by comprising
Receive at least alarm signal from a big data platform;
The corresponding error log of the alarm signal is obtained, the corresponding alarm classification of the error log is analyzed;
Corresponding control instruction is determined according to the alarm classification that analysis obtains;And
Identified control instruction is sent to the big data platform, the control instruction is for controlling the big data platform Execute corresponding operation.
2. platform alert processing method as described in claim 1, which is characterized in that the alarm signal includes the big data The identification information of the service processes of run-time error or generation resource problem, described to obtain the corresponding mistake of the alarm signal in platform Missing log includes:
Analyze the identification information of the service processes included in the alarm signal;
Log acquisition request is generated according to the identification information that analysis obtains;
Log acquisition request is sent to the big data platform, the log acquisition request is for controlling the big data The error log that corresponding service processes generate is sent to the electronic device by platform.
3. platform alert processing method as described in claim 1, which is characterized in that the analysis error log is corresponding Alerting classification includes:
It whether identifies in the error log comprising at least one default error-critical word;And
When in the error log including the default error-critical word, the default mistake is determined according to an error information table The corresponding alarm classification of keyword, wherein the error information table includes multiple default error-critical words and multiple alarm classes Not, the corresponding at least one default error-critical word of each alarm classification.
4. platform alert processing method as claimed in claim 3, which is characterized in that the error log is marked with a mark Position, the corresponding alarm level of the flag bit, the alarm level are used to indicate when there are multiple error logs, priority processing The higher error log of alarm level identifies in the error log whether wrap when the alarm signal received is more than one Before at least one default error-critical word further include:
It identifies the flag bit that each error log is recorded, the alarm level of the error log is determined according to the flag bit, Wherein, identify the error log whether include default error-critical word be the error log is successively identified according to alarm level Whether default error-critical word is included.
5. platform alert processing method as claimed in claim 3, which is characterized in that in the identification error log whether It is the corresponding preset rules of format selection based on the error log comprising at least one default error-critical word, and according to described Preset rules carry out information extraction, to identify whether the error log includes default error-critical word.
6. platform alert processing method as described in claim 1, which is characterized in that the alarm classification includes at least the first kind Alarm, the alarm of the second class and the alarm of third class, the first kind alarm includes to big data platform environment and resource problem Alarm, the second class alarm include the alarm to mission script problem, and the third class alarm includes not completing to day task The alarm of problem, corresponding first control instruction of first kind alarm classification, first control instruction are described big for controlling Data platform directly runs task again.Corresponding second control instruction of second class alarm classification, second control instruction are used for It controls the big data platform and stops task.The third class alarm classification corresponds to third control instruction, and the third control refers to It enables for controlling the big data platform suspended task.
7. a kind of platform alarm treatment device characterized by comprising
Receiving module is also used to obtain the alarm signal for receiving at least alarm signal from a big data platform Corresponding error log;
Analysis module, for analyzing the corresponding alarm classification of the error log;
Determining module, the alarm classification for being analyzed according to the analysis module determine corresponding control instruction;And
Sending module, for control instruction determined by the determining module to be sent to the big data platform, the control Instruction executes corresponding operation for controlling the big data platform.
8. a kind of electronic device, including processor and memory, which is characterized in that be stored in the memory at platform alarm Program is managed, the processor is for executing the platform alarming processing program to realize such as any one of claim 1 to 6 institute The platform alert processing method stated.
9. a kind of computer readable storage medium, which is characterized in that be stored with platform announcement on the computer readable storage medium Alert processing routine realizes that any one of such as claim 1-6's is described when the platform alarming processing program is executed by processor Platform alert processing method.
CN201811151626.7A 2018-09-29 2018-09-29 Platform alert processing method, device, electronic device and storage medium Pending CN109542737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811151626.7A CN109542737A (en) 2018-09-29 2018-09-29 Platform alert processing method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811151626.7A CN109542737A (en) 2018-09-29 2018-09-29 Platform alert processing method, device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN109542737A true CN109542737A (en) 2019-03-29

Family

ID=65843669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811151626.7A Pending CN109542737A (en) 2018-09-29 2018-09-29 Platform alert processing method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN109542737A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209644A (en) * 2019-05-21 2019-09-06 上海易点时空网络有限公司 The method, apparatus and system of log management
CN111124859A (en) * 2019-12-13 2020-05-08 北京浪潮数据技术有限公司 Log processing method, device, equipment and storage medium
CN111198850A (en) * 2019-12-14 2020-05-26 深圳猛犸电动科技有限公司 Log message processing method and device and Internet of things platform
CN112882920A (en) * 2021-04-29 2021-06-01 云账户技术(天津)有限公司 Alarm policy verification method and device, electronic equipment and readable storage medium
CN113485886A (en) * 2021-06-25 2021-10-08 青岛海尔科技有限公司 Alarm log processing method and device, storage medium and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981943A (en) * 2012-10-29 2013-03-20 新浪技术(中国)有限公司 Method and system for monitoring application logs
US20140149525A1 (en) * 2012-11-28 2014-05-29 Electronics And Telecommunications Research Institute Method and apparatus for transmitting and receiving instant message
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
CN105550103A (en) * 2015-12-03 2016-05-04 泰华智慧产业集团股份有限公司 Custom test script based automated testing method
CN107123314A (en) * 2017-04-24 2017-09-01 努比亚技术有限公司 A kind of method for realizing alarming processing, system, terminal and equipment
CN107612740A (en) * 2017-09-30 2018-01-19 武汉光谷信息技术股份有限公司 A kind of daily record monitoring system and method under distributed environment
CN107729206A (en) * 2017-09-04 2018-02-23 上海斐讯数据通信技术有限公司 Real-time analysis method, system and the computer-processing equipment of alarm log

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981943A (en) * 2012-10-29 2013-03-20 新浪技术(中国)有限公司 Method and system for monitoring application logs
US20140149525A1 (en) * 2012-11-28 2014-05-29 Electronics And Telecommunications Research Institute Method and apparatus for transmitting and receiving instant message
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
CN105550103A (en) * 2015-12-03 2016-05-04 泰华智慧产业集团股份有限公司 Custom test script based automated testing method
CN107123314A (en) * 2017-04-24 2017-09-01 努比亚技术有限公司 A kind of method for realizing alarming processing, system, terminal and equipment
CN107729206A (en) * 2017-09-04 2018-02-23 上海斐讯数据通信技术有限公司 Real-time analysis method, system and the computer-processing equipment of alarm log
CN107612740A (en) * 2017-09-30 2018-01-19 武汉光谷信息技术股份有限公司 A kind of daily record monitoring system and method under distributed environment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209644A (en) * 2019-05-21 2019-09-06 上海易点时空网络有限公司 The method, apparatus and system of log management
CN111124859A (en) * 2019-12-13 2020-05-08 北京浪潮数据技术有限公司 Log processing method, device, equipment and storage medium
CN111198850A (en) * 2019-12-14 2020-05-26 深圳猛犸电动科技有限公司 Log message processing method and device and Internet of things platform
CN112882920A (en) * 2021-04-29 2021-06-01 云账户技术(天津)有限公司 Alarm policy verification method and device, electronic equipment and readable storage medium
CN112882920B (en) * 2021-04-29 2021-06-29 云账户技术(天津)有限公司 Alarm policy verification method and device, electronic equipment and readable storage medium
CN113485886A (en) * 2021-06-25 2021-10-08 青岛海尔科技有限公司 Alarm log processing method and device, storage medium and electronic device
CN113485886B (en) * 2021-06-25 2023-07-21 青岛海尔科技有限公司 Alarm log processing method and device, storage medium and electronic device

Similar Documents

Publication Publication Date Title
US11165806B2 (en) Anomaly detection using cognitive computing
CN109542737A (en) Platform alert processing method, device, electronic device and storage medium
US11544721B2 (en) Supporting automation of customer service
US11132358B2 (en) Candidate name generation
CN110347888B (en) Order data processing method and device and storage medium
CN113626241B (en) Abnormality processing method, device, equipment and storage medium for application program
CN114244611B (en) Abnormal attack detection method, device, equipment and storage medium
CN111582341A (en) User abnormal operation prediction method and device
CN113515434A (en) Abnormity classification method, abnormity classification device, abnormity classification equipment and storage medium
US11568344B2 (en) Systems and methods for automated pattern detection in service tickets
CN114580933A (en) Event distribution method and device, storage medium and electronic equipment
US11783221B2 (en) Data exposure for transparency in artificial intelligence
CN112541447A (en) Machine model updating method, device, medium and equipment
CN112801145A (en) Safety monitoring method and device, computer equipment and storage medium
CN109558222A (en) Batch service process monitoring method, device, computer and readable storage medium storing program for executing
US20210092159A1 (en) System for the prioritization and dynamic presentation of digital content
CN114330720A (en) Knowledge graph construction method and device for cloud computing and storage medium
CN112148461A (en) Application scheduling method and device
CN110806961A (en) Intelligent early warning method and system and recommendation system
CN113537519A (en) Method and device for identifying abnormal equipment
US20190238400A1 (en) Network element operational status ranking
CN113434404B (en) Automatic service verification method and device for verifying reliability of disaster recovery system
CN115858325B (en) Project log adjusting method, device, equipment and storage medium
US11551006B2 (en) Removal of personality signatures
CN117215747A (en) Client-oriented abnormal operation processing method, storage medium and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190329

RJ01 Rejection of invention patent application after publication